[go: up one dir, main page]

WO2025068196A1 - Biometric recognition dataset generation - Google Patents

Biometric recognition dataset generation Download PDF

Info

Publication number
WO2025068196A1
WO2025068196A1 PCT/EP2024/076805 EP2024076805W WO2025068196A1 WO 2025068196 A1 WO2025068196 A1 WO 2025068196A1 EP 2024076805 W EP2024076805 W EP 2024076805W WO 2025068196 A1 WO2025068196 A1 WO 2025068196A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition system
image
biometric recognition
context information
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2024/076805
Other languages
French (fr)
Inventor
Patrick Michl
Christian Lennartz
Jevgenij JEGOROV
Lars DIESSELBERG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TrinamiX GmbH
Original Assignee
TrinamiX GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TrinamiX GmbH filed Critical TrinamiX GmbH
Publication of WO2025068196A1 publication Critical patent/WO2025068196A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the invention is in the area of generation of datasets for biometric recognition.
  • the invention relates to a method for generating datasets for training a biometric recognition system, the use of the datasets for training a biometric recognition system, a biometric recognition system trained with datasets, a system for generating datasets for training a biometric recognition system and a non-transient computer-readable medium including instructions for generating datasets for training a biometric recognition system.
  • Modern biometric recognition systems typically involve models, for example artificial neural networks. These models are trained to differentiate between an authorized user and an unauthorized user or a spoofing mask.
  • models for example artificial neural networks. These models are trained to differentiate between an authorized user and an unauthorized user or a spoofing mask.
  • US 2020/0104570 discloses an optical face recognition system in which an image from a user is evaluated including a trained neural network.
  • the model training requires many training datasets.
  • the datasets must involve many different biometric recognition subjects each recorded at various environmental conditions, for example different distances, angles, temperatures, or light conditions.
  • the biometric capture device of the recognition system plays a role, for example for an optical scanner system, the illumination, the camera and potentially a display through which the light is transmitted must be taken into account. Recording many biometric recognition subjects under different environmental conditions with all conceivable biometric capture devices is practically impossible.
  • the present invention relates to a computer-implemented method for generating datasets for training a biometric recognition system
  • the present invention relates to a computer-implemented method for generating datasets for training a biometric recognition system
  • a computer-implemented method for generating datasets for training a biometric recognition system comprising a. receiving context information associated with the biometric capture device, the biometric capture environment or the biometric recognition subject, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
  • the present invention relates to a use of the datasets obtained by the method according to the invention for training a data-driven model of a biometric recognition system.
  • the present invention relates to a use of the datasets obtained by the method according to the invention for training a biometric recognition system.
  • the present invention relates to a method of access control to a device or application comprising a. receiving a request for accessing the device or application, b. in response to the request executing a biometric recognition with a system trained with datasets obtained by the method according to the present invention, c. granting access to the device or application depending on the outcome of the biometric recognition.
  • the present invention relates to a biometric recognition system comprising a data-driven model trained with datasets obtained by the method according to the present invention.
  • the present invention relates to a biometric recognition system trained with datasets obtained by the method according to the present invention.
  • the present invention relates to a system for generating datasets for training a biometric recognition system
  • a system for generating datasets for training a biometric recognition system comprising a. an input for receiving context information associated with hardware components of the biometric capture device, b. a processor for generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. an output for outputting the dataset.
  • the present invention relates to a system for generating datasets for training a biometric recognition system
  • a system for generating datasets for training a biometric recognition system comprising a. an input for receiving context information associated with the biometric capture device, the biometric capture environment or the biometric recognition subject, b. a processor for generating a dataset for training a biometric recognition system by providing the context information to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. an output for outputting the dataset.
  • the present invention relates to a non-transient computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to perform a method comprising a. receiving context information associated with hardware components of the biometric capture device, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
  • the present invention relates to a non-transient computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to perform a method comprising a. receiving context information associated with the biometric capture device, the biometric capture environment or the biometric recognition subject, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
  • the present invention relates to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information.
  • a biometric recognition system may be implemented by multimodal approaches that involve separate models, specialized on different biometric features. These models may be trained to collaboratively ensure the identity of a biometric recognition subject and protect it against spoofing-attacks.
  • the training of the specialized models requires comprehensive datasets which sufficiently represent the respectively assessed biometric features within the considered target population.
  • the biometric recognition system is more robust against environmental conditions, for example different distances, angles, temperatures, or light conditions. The demand for recording test persons under various conditions is drastically reduced, so the biometric recognition system is more rapidly ready for use.
  • the biometric recognition system can produce more reliable results for biometric recognition subjects with an unusual appearance, for example stains, scares or other deformations due to an accident or a disease. Such rare appearances can hardly be taken into account from biometric captures of real humans as it is usually too difficult to acquire enough representative test persons.
  • Biometric recognition may refer to any procedure which uses a characteristic of a human to identify the user.
  • Biometric recognition may comprise identity recognition, i.e. is the correct person in front of the biometric capture device, and authentication, i.e. is a real person in front of the biometric capture device and not a spoofing object which looks identical to the correct person.
  • Biometric recognition may include optical biometric recognition like face recognition, iris scan, palm scan or fingerprint scan; or acoustic recognition like voice recognition.
  • Optical biometric recognition may be passive, i.e. an image is recorded of the user or part of the user to be recognized, wherein the user is only under irradiation of ambient light.
  • Optical biometric recognition may be active, i.e. an image is recorded of the user or part of the user to be recognized, wherein the user is only under irradiation of light emitted by a projector.
  • the term “light” may refer to electromagnetic radiation in one or more of the infrared, the visible and the ultraviolet spectral range.
  • the term “ultraviolet spectral range” generally, refers to electromagnetic radiation having a wavelength of 1 nm to 380 nm, preferably of 100 nm to 380 nm.
  • visible spectral range generally, refers to a spectral range of 380 nm to 760 nm.
  • IR infrared spectral range
  • NIR near infrared spectral range
  • MidlR mid infrared spectral range
  • FIR far infrared spectral range
  • light used for the typical purposes of the present invention is light in the infrared (IR) spectral range, more preferred, in the near infrared (NIR) and/or the mid infrared spectral range (MidlR), especially the light having a wavelength of 1 pm to 5 pm, preferably of 1 pm to 3 pm.
  • IR infrared
  • NIR near infrared
  • MidlR mid infrared spectral range
  • An optical biometric recognition system may comprise a projector.
  • the term "projector” may refer to a device configured for generating or providing light in the sense of the above-mentioned definition.
  • the projector may be a pattern projector, a floodlight projector or both either simultaneously or the projector may repeatedly switch from illuminating patterned light to floodlight.
  • pattern projector may refer to a device configured for generating or providing at least one light pattern, in particular at least one infrared light pattern.
  • the term "light pattern” may refer to at least one pattern comprising a plurality of light spots.
  • the light spot may be at least partially spatially extended. At least one spot or any spot may have an arbitrary shape. In some cases, a circular shape of at least one spot or any spot may be preferred.
  • the spots may be arranged by considering a structure of a display comprised by a device that is further comprising the optoelectronic apparatus. Typically, an arrangement of an OLED-pixel-structure of the display may be considered.
  • the term "infrared light pattern” may refer to a light pattern comprising spots in the infrared spectral range.
  • the infrared light pattern may be a near infrared light pattern.
  • the infrared light may be coherent.
  • the infrared light pattern may be a coherent infrared light pattern.
  • the pattern projector may be configured for emitting monochromatic light, e.g. in the near infrared region.
  • monochromatic may refer to light with a wavelength accuracy of less or equal to ⁇ 2 % or less or equal to ⁇ 1 %.
  • the wavelength accuracy may be the maximum difference of emitted wavelength relative to the mean wavelength.
  • the pattern projector may be adapted to emit light with a plurality of wavelengths, e.g. for allowing additional measurements in other wavelengths channels.
  • the infrared light pattern may comprise at least one regular and/or constant and/or periodic pattern such as a triangular pattern, a rectangular pattern, a hexagonal pattern or a pattern comprising further convex tilings.
  • the infrared light pattern is a hexagonal pattern, preferably a hexagonal infrared light pattern, preferably a 2/5 hexagonal infrared light pattern.
  • Using a periodical 2/5 hexagonal pattern can allow distinguishing between artefacts and usable signal.
  • the light pattern may comprise less than 4000 spots, for example less than 3000 spots or less than 2000 spots or less than 1500 spots or less than 1000 spots.
  • the light pattern may comprise patterned coherent infrared light of less than 4000 spots or less than 3000 spots or less than 2000 spots or less than 1500 spots or less than 1000 spots.
  • At least one of the infrared light spots may be associated with a beam divergence of 0.2° to 0.5°, preferably 0.1 ° to 0.3°.
  • beam divergence may refer to at least one measure of an increase in at least one diameter and/or at least one diameter equivalent, such as a radius, with a distance from an optical aperture from which the beam emerges.
  • the measure may be an angle or an angle equivalent.
  • a beam divergence may be determined at 1/e 2 .
  • the pattern projector may comprise at least one pattern projector configured for generating the infrared light pattern.
  • the pattern projector may comprise at least one emitter, in particular a plurality of emitters.
  • the term "emitter” may refer to at least one arbitrary device configured for providing at least one light beam. The light beam may generate the infrared light pattern.
  • the emitter may comprise at least one element selected from the group consisting of at least one laser source such as at least one semi-conductor laser, at least one double heterostructure laser, at least one external cavity laser, at least one separate confinement heterostructure laser, at least one quantum cascade laser, at least one distributed Bragg reflector laser, at least one polariton laser, at least one hybrid silicon laser, at least one extended cavity diode laser, at least one quantum dot laser, at least one volume Bragg grating laser, at least one Indium Arsenide laser, at least one Gallium Arsenide laser, at least one transistor laser, at least 50 one diode pumped laser, at least one distributed feedback lasers, at least one quantum well laser, at least one interband cascade laser, at least one semiconductor ring laser, at least one vertical cavity surface emitting laser (VCSEL); at least one non-laser light source such as at least one LED or at least one light bulb.
  • at least one laser source such as at least one semi-conductor laser, at least one double heterostructure laser
  • the pattern projector comprises at least one least one VCSEL, preferably a plurality of VCSELs.
  • the plurality of VCSELs may be arranged in at least one array, e.g. comprising a matrix of VCSELs.
  • the VCSELs may be arranged on the same substrate, or on different substrates.
  • the term "vertical-cavity surface-emitting laser” may refer to a semiconductor laser diode configured for laser beam emission perpendicular with respect to a top surface. Examples for VCSELs can be found e.g. in en.wikipedia.org/wikiA/erticalcavity_surface-emitting_laser.
  • VCSELs are generally known to the skilled user such as from WO 2017/222618 A.
  • Each of the VCSELs is configured for generating at least one light beam.
  • the plurality of generated spots may be associated with the infrared light pattern.
  • the VCSELs may be configured for emitting light beams at a wavelength range from 800 to 1000 nm.
  • the VCSELs may be configured for emitting light beams at 808 nm, 850 nm, 940 nm, and/or 980 nm.
  • the VCSELs emit light 940 nm, since terrestrial sun radiation has a local minimum in irradiance at this wavelength, e.g. as described in CIE 085-1989 removableSolar spectral Irradiance”.
  • the pattern projector may comprise at least one optical element configured for increasing, e.g. duplicating, the number of spots generated by the pattern projector.
  • the pattern projector particularly the optical element, may comprises at least one diffractive optical element (DOE) and/or at least one meta surface element.
  • DOE diffractive optical element
  • the DOE and/or the meta surface element may be configured for generating multiple light beams from a single incoming light beam. Further arrangements, particularly comprising a different number of projecting VCSEL and/or at least one different optical element configured for increasing the number of spots may be possible. Other multiplication factors are possible. For example, a VCSEL or a plurality of VCSELs may be used and the generated laser spots may be duplicated by using at least one DOE.
  • the pattern projector may comprise at least one transfer device.
  • transfer device also denoted as “transfer system” may refer to one or more optical elements which are adapted to modify the light beam, particularly the light beam used for generating at least a portion of the infrared light pattern, such as by modifying one or more of a beam parameter of the light beam, a width of the light beam or a direction of the light beam.
  • the transfer device may comprise at least one imaging optical device .
  • the transfer device specifically may comprise one or more of: at least one lens, for example at least one lens selected from the group consisting of at least one focus-tunable lens, at least one aspheric lens, at least one spherical lens, at least one Fresnel lens; at least one diffractive optical element; at least one concave mirror; at least one beam deflection element, preferably at least one mirror; at least one beam splitting element, preferably at least one of a beam splitting cube or a beam splitting mirror; at least one multi lens system; at least one holographic optical element; at least one meta optical element.
  • the transfer device comprises at least one refractive optical lens stack.
  • the transfer device may comprise a multi-lens system having refractive properties.
  • the pattern projector may be configured for emitting modulated or non-modulated light.
  • the different emitters may have different modulation frequencies, e.g. which can be used for distinguishing the light beams.
  • the light beam or light beams generated by the pattern projector may propagate parallel to an optical axis.
  • the pattern projector may comprise at least one reflective element, preferably at least one prism, for deflecting the illuminating light beam onto the optical axis.
  • the light beam or light beams, such as the laser light beam, and the optical axis may include an angle of less than 10°, preferably less than 5° or even less than 2°. Other embodiments, however, are feasible. Further, the light beam or light beams may be on the optical axis or off the optical axis.
  • the light beam or light beams may be parallel to the optical axis having a distance of less 10 than 10 mm to the optical axis, preferably less than 5 mm to the optical axis or even less than 1 mm to the optical axis or may even coincide with the optical axis.
  • the term “flood projector” may refer to at least one device configured for providing substantially continuous spatial illumination.
  • the flood projector may illuminate a measurement area, such as a user, a portion of the user and/or a face of the user, with a spatially constant or essentially constant illumination intensity.
  • the term “flood light” may refer to substantially continuous spatial illumination, in particular diffuse and/or uniform illumination.
  • the flood light has a wavelength in the infrared range, in particular in the near infrared range.
  • the flood projector may comprise at least one least one VCSEL, preferably a plurality of VCSELs, for example an array of VCSELs.
  • substantially continuous spatial illumination may refer to uniform spatial illumination, wherein areas of non-uniform are possible.
  • a relative distance between the flood projector and the pattern projector may be below 3.0 mm.
  • the relative distance between the flood projector and the pattern projector may be below 2.5 mm, preferably below 2.0 mm.
  • the pattern projector and the flood projector may be combined into one module.
  • the pattern projector and the flood projector may be arranged on the same substrate, in particular having a minimum relative distance.
  • the minimum relative distance may be defined by a physical extension of the flood projector and the pattern projector. Arranging the pattern projector and the flood projector having a relative distance below 3.0 mm can result in decreased space requirement of the two projectors. In particular, said projectors can even be combined into one module.
  • the pattern projector and the flood projector may comprise at least one VCSEL, preferably a plurality of VCSELs, for example an array of VCSELs.
  • the pattern projector may comprise a plurality of first VCSELs mounted on a first platform.
  • the flood projector may comprise a plurality of second VCSELs mounted on a second platform. The second platform may be beside the first platform.
  • the optoelectronic apparatus may comprise a heat sink. Above the heat sink a first increment comprising the first platform may be attached. Above the heat sink a second increment comprising the second platform may be attached.
  • the second increment may be different from the first increment.
  • the first platform may be more distant to the optical element configured for increasing, e.g. duplicating, the number of spots.
  • the second platform may be closer to the optical element.
  • the beam emitted from the second VCSEL may be defocused and thus, form overlapping spots. This leads to a substantially continuous illumination and, thus, to flood illumination.
  • the projector may be positioned such that it can illuminate light through the transparent display. Hence, light emitted by the projector may cross the transparent display before it impinges on the user. From the user's view, the projector may be placed behind the transparent display.
  • An optical biometric recognition system may comprise a camera.
  • the term "camera” may refer to at least one unit of the optoelectronic apparatus configured for generating at least one image.
  • the image may be generated via a hardware and/or a software interface, which may be considered as the camera.
  • image generation may refer to capturing and/or generating and/or determining and/or recording at least one image by using the camera.
  • the image generation may comprise imaging and/or recording the image.
  • the image generation may comprise capturing a single image and/or a plurality of images such as a sequence of images.
  • the capturing and/or generating and/or determining and/or recording of the image may be caused and/or initiated by the hardware and/or the software interface.
  • the image generation may comprise recording continuously a sequence of images such as a video or a movie.
  • the image generation may be initiated by a user action or may automatically be initiated, e.g. once the presence of at least one object or user within a field of view and/or within a predetermined sector of the field of view of the camera is automatically detected.
  • the camera may comprise at least one optical sensor, in particular at least one pixelated optical sensor.
  • the camera may comprise at least one CMOS sensor or at least one CCD chip.
  • the camera may comprise at least one CMOS sensor, which may be sensitive in the infrared spectral range.
  • image may refer to data recorded by using the optical sensor, such as a plurality of electronic readings from the CMOS or CCD chip.
  • the image may comprise raw image data or may be a pre-processed image.
  • the pre-processing may comprise applying at least one filter to the raw image data and/or at least one background correction and/or at least one background subtraction.
  • the camera may comprise a color camera, e.g. comprising at least color pixels.
  • the camera may comprise a color CMOS camera.
  • the camera may comprise black and white pixels and color pixels.
  • the color pixels and the black and white pixels may be combined internally in the camera.
  • the camera may comprise a color camera (e.g. RGB) or a black and white camera, such as a black and white CMOS.
  • the camera may comprise a black and white CMOS chip.
  • the camera generally may comprise a one-dimensional or two-dimensional array of image sensors, such as pixels.
  • the color camera may be an internal and/or external camera of a device comprising the optoelectronic apparatus.
  • the internal and/or external camera of the device may be accessed via a hardware and/or a software interface comprised by the optoelectronic apparatus, which is used as the camera.
  • the device is or comprises a smartphone
  • the image generating unit may be a front camera, such as a selfie camera, and/or back camera of the smartphone.
  • the camera may have a field of view between 10°x10° and 75°x75°, preferably 55°x65°.
  • the camera may have a resolution below 2 MP, preferably between 0.3 MP and 1.5 MP.
  • the camera may comprise further elements, such as one or more optical elements, e.g. one or more lenses.
  • the optical sensor may be a fix-focus camera, having at least one lens which is fixedly adjusted with respect to the camera.
  • the camera may also comprise one or more variable lenses which may be adjusted, automatically or manually.
  • Other cameras are feasible.
  • pattern image may refer to an image generated by the camera while illuminating the infrared light pattern, e.g. on an object and/or a user.
  • the pattern image may comprise an image showing a user, in particular at least parts of the face of the user, while the user is being illuminated with the infrared light pattern, particularly on a respective area of interest comprised by the image.
  • the pattern image may be generated by imaging and/or recording light reflected by an object and/or user which is illuminated by the infrared light pattern.
  • the pattern image showing the user may comprise at least a portion of the illuminated infrared light pattern on at least a portion the user.
  • the illumination by the pattern illumination source and the imaging by using the optical sensor may be synchronized, e.g. by using at least one control unit of the optoelectronic apparatus.
  • the term "flood image” may refer to an image generated by the camera while illumination source is illuminating infrared flood light, e.g. on an object and/or a user.
  • the flood image may comprise an image showing a user, in particular the face of the user, while the user is being illuminated with the flood light.
  • the flood image may be generated by imaging and/or recording light reflected by an object and/or user which is illuminated by the flood light.
  • the flood image showing the user may comprise at least a portion of the flood light on at least a portion the user.
  • the illumination by the flood illumination source and the imaging by using the optical sensor may be synchronized, e.g. by using at least one control unit of the optoelectronic apparatus.
  • the camera may be configured for imaging and/or recording the pattern image and the flood image at the same time or at different times.
  • the camera may be configured for imaging and/or recording the pattern image and the flood image at at least partially overlapping measurement areas or equivalents of the measurement areas.
  • the optical biometric recognition system may comprise a transparent display.
  • the camera or the projector may be placed behind the transparent display in order maximize the display area of a device.
  • the term "display” may refer to an arbitrary shaped device configured for displaying an item of information.
  • the item of information may be arbitrary information such as at least one image, at least one diagram, at least one histogram, at least one graphic, text, numbers, at least one sign, or an operating menu.
  • the display may be or may comprise at least one screen.
  • the display may have an arbitrary shape, e.g. a rectangular shape.
  • the display may be a front display of the device.
  • the display may be or may comprise at least one organic light-emitting diode (OLED) display.
  • organic light emitting diode may refer to a light-emitting diode (LED) in which an emissive electroluminescent layer is a film of organic compound configured for emitting light in response to an electric current.
  • the OLED display may be configured for emitting visible light.
  • the display, particularly a display area may be covered by glass.
  • the display may comprise at least one glass cover.
  • the transparent display may be at least partially transparent.
  • the term "at least partially transparent” may refer to a property of the display to allow light, in particular of a certain wavelength range, e.g. in the infrared spectral region, in particular in the near infrared spectral region, to pass at least partially through.
  • the display may be semitransparent in the near infrared region.
  • the display may have a transparency of 20 % to 50 % in the near infrared region.
  • the display may have a different transparency for other wavelength ranges.
  • the display may have a transparency of > 80 % for the visible spectral range, preferably > 90 % for the visible spectral range.
  • the transparent display may be at least partially transparent over the entire display area or only parts thereof. Typically, it is sufficient if only those parts of the display area are at least partially transparent trough which light needs to pass from the projector or to the camera.
  • the display comprises a display area.
  • the term "display area” may refer to an active area of the display, in particular an area which is activatable.
  • the display may have additional areas such as recesses or cutouts.
  • the display may have a first area associated with a first pixel per inch (PPI) value and a second area associated with a second PPI value.
  • the first PPI value may be lower than the second PPI value, preferably first PPI value is equal to or below 400 PPI, more preferably the second PPI value may be equal to or higher than 300 PPI.
  • the first PPI value may be associated with the at least one continuous area being at least partially transparent.
  • Optical biometric recognition may comprise identifying the user based on the flood image.
  • the term "identifying” may refer to identity check and/or verifying an identity of the user.
  • the identifying of the user may comprise analyzing the flood image.
  • the analyzing of the flood image may comprise performing a face verification of the imaged face to be the user's face.
  • the analyzing may comprise one or more of the following: a filtering; a selection of at least one region of interest; a formation of a difference image between the flood image and at least one offset; an inversion of flood image; a background correction; a decomposition into color channels; a decomposition into hue; saturation; and brightness channels; a frequency decomposition; a singular value decomposition; applying a Canny edge detector; applying a Laplacian of Gaussian filter; applying a Difference of Gaussian filter; applying a Sobel operator; applying a Laplace operator; applying a Scharr operator; applying a Prewitt operator; applying a Roberts operator; applying a Kirsch operator; applying a high-pass filter; applying a low-pass filter; applying a Fourier transformation; applying a Radon-transfor- mation; applying a Hough-transformation; applying a wavelet-transformation; a thresholding; creating a binary image.
  • the region of interest may be determined manually by a user or may be determined automatically, such as by recognizing the user within the image.
  • the analyzing of the flood image may comprise using at least one image recognition technique, in particular a face recognition technique.
  • An image recognition technique comprises at least one process of identifying the user in an image.
  • the image recognition may comprise using at least one technique selected from the technique consisting of: color-based image recognition, e.g. using features such as hue, saturation, and value (HSV) or red, green, blue (RGB); template matching, for example as illustrated on https://www.mathworks.com/help/vision/ug/pattern-matching.html; image segment and/or blob analysis e.g. using size, color, or shape; machine learning and/or deep learning e.g. using at least one convolutional neural network.
  • HSV hue, saturation, and value
  • RGB red, green, blue
  • template matching for example as illustrated on https://www.mathworks.com/help/vision/ug/pat
  • the neural network may be trained by the user, such as in a training procedure, in which the user is indicated to take at least one or a plurality of pictures showing himself.
  • the analyzing of the flood image may comprise determining a plurality of facial features.
  • the analyzing may comprise comparing, in particular matching, the determined facial features with template features.
  • the template features may be features extracted from at least one template.
  • the template may be or may comprise at least one image generated in an enrollment process, e.g. when initializing the authentication system. Template may be an image of an authorized user.
  • the template features and/or the facial feature may comprise a vector.
  • Matching of the features may comprise determining a distance between the vectors.
  • the identifying of the user may comprise comparing the distance of the vectors to a least one predefined limit, wherein the user is successfully identified in case the distance is smaller than or equal to the predefined limit at least within tolerances. The user declining and/or rejected otherwise.
  • the image recognition may comprise using at least one model, in particular a trained model comprising at least one face recognition model.
  • the analyzing of the flood image may be performed by using a face recognition system, such as FaceNet, e.g. as described in Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering”, arXiv: 1503.03832.
  • the trained model may comprises at least one convolutional neural network.
  • the convolutional neural network may be designed as described in M. D. Zeller and R. Fergus, "Visualizing and understanding convolutional networks”, CoRR, abs/1311.2901, 2013, or C.
  • Learned-Miller "Labeled faces in the wild: A database for studying face recognition in unconstrained environments”, Technical Report 07-49, University of Massachusetts, Amherst, October 2007, the Youtube® Faces Database as described in L. Wolf, T. Hassner, and I. Maoz, "Face recognition in unconstrained videos with matched background similarity”, in IEEE Conf, on CVPR, 2011, or Google® Facial Expression Comparison dataset.
  • the training of the convolutional neural network may be performed as described in Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering”, arXiv: 1503.03832.
  • Image artifacts caused by diffraction of the light when passing the transparent display may be corrected.
  • the term "correct” may mean partially or fully remove the artifacts or tag them so they can be excluded from further processing, in particular from determine if the imaged user is an authorized user.
  • Correcting image artifacts may take into account the information about the transparent display, in particular the dimensions of the pixels or the distance of repeating features to each other. This information can facilitate identifying artifacts as diffraction patterns can be calculated and compared to the image.
  • Correcting image artifacts may comprise identifying reflection features, sorting them by brightness and selecting the locally brightest features.
  • the information of the transparent display may be used, in particular a distance in the image by which a light beam may be displaced by diffraction on the transparent display may be calculated based on the information about the transparent display. This method can be particularly useful for pattern images. Further details are disclosed in WO 2021/105265 A1.
  • Optical biometric recognition may comprise determining material data, for example based on the pattern image. Particularly by considering the material as a parameter for validating the authentication process, the authentication process may be robust against being outwitted by using a recorded image of the user.
  • Extracting the material data from the pattern image may be performed by beam profile analysis of the light spots.
  • Beam profile analysis can allow for providing a reliable classification of scenes based on a few light spots.
  • Each of the light spots of the pattern image may comprise a beam profile.
  • the term "beam profile” may generally refer to at least one intensity distribution of the light spot on the optical sensor as a function of the pixel.
  • the beam profile may be selected from the group consisting of a trapezoid beam profile; a triangle beam profile; a conical beam profile and a linear combination of Gaussian beam profiles.
  • extracting material data from the pattern image may comprise generating the material type and/or data derived from the material type.
  • extracting material data may be based on the pattern image.
  • Material data may be extracted by using at least one material model. Extracting material data may include providing the pattern image to a material model and/or receiving material data from the material model.
  • Providing the image to a material model may comprise and may be followed by receiving the pattern image at an input layer of the material model or via a material model loss function.
  • the material model may be a data-driven model.
  • Data-driven model may comprise a convolutional neural network and/or an encoder decoder structure such as an autoencoder.
  • generating a representation may be FFT, wavelets, deep learning, like CNNs, energy models, normalizing flows, GANs, vision transformers, or transformers used for natural language processing, Autoregressive Image Modeling, Normalizing Flows, Deep Autoencoders, Deep Energy-Based Models.
  • Supervised or unsupervised schemes may be applicable to generate a representation, also embedding in e.g. cosine or Euclidian metric in ML language.
  • the data- driven model may be parametrized according to a training data set including at least one image and material data, preferably at least one pattern image and material data.
  • extracting material data may include providing the image to a material model and/or receiving material data from the material model.
  • the data-driven model may be trained according to a training data set including at least one image and material data.
  • the data-driven model may be parametrized according to a training data set including at least one image and material data.
  • the data-driven model may be parametrized according to a training data set to receive the image and provide material data based on the received image.
  • the data-driven model may be trained according to a training data set to receive the image and provide material data as output based on the received image.
  • the training data set may comprise at least one image and material data, preferably material data associated with the at least one image.
  • the image may comprise a representation of the image.
  • the representation may be a lower dimensional representation of the image.
  • the representation may comprise at least a part of the data or the information associated with the image.
  • the representation of an image may comprise a feature vector.
  • determining a representation, in particular a lower-dimensional representation may be based on principal component analysis (PCA) mapping or radial basis function (RBF) mapping. Determining a representation may also be referred to as generating a representation. Generating a representation based on PCA mapping may include clustering based on features in the pattern image and/or partial image. Additionally or alternatively, generating a representation may be based on neural network structures suitable for reducing dimensionality. Neural network structures suitable for reducing dimensionality may comprise encoder and/or decoder. In an example, neural network structure may be an autoencoder.
  • neural network structure may comprise a convolutional neural network (CNN).
  • the CNN may comprise at least one convolutional layer and/or at least one pooling layer.
  • CNNs may reduce the dimensionality of a partial image and/or an image by applying a convolution, e.g. based on a convolutional layer, and/or by pooling. Applying a convolution may be suitable for selecting feature related to material information of the pattern image.
  • a material model may be suitable for determining an output based on an input.
  • material model may be suitable for determining material data based on an image as input.
  • a material model may be a deterministic model, a data-driven model or a hybrid model.
  • the deterministic model preferably, reflects physical phenomena in mathematical form, e.g., including first-principles models.
  • a deterministic model may comprise a set of equations that describe an interaction between the material and the patterned electromagnetic radiation thereby resulting in a condition measure, a vital sign measure or the like.
  • a data-driven model may be a classification model.
  • a hybrid model may be a classification model comprising at least one machine-learning architecture with deterministic or statistical adaptations and model parameters. Statistical or deterministic adaptations may be introduced to improve the quality of the results since those provide a systematic relation between empiricism and theory.
  • the data-driven model may be a classification model.
  • the classification model may comprise at least one machinelearning architecture and model parameters.
  • the machine-learning architecture may be or may comprise one or more of: linear regression, logistic regression, random forest, piecewise linear, nonlinear classifiers, support vector machines, naive Bayes classifications, nearest neighbors, neural networks, convolutional neural networks, generative adversarial networks, support vector machines, or gradient boosting algorithms or the like.
  • the material model can be a multi-scale neural network or a recurrent neural network (RNN) such as, but not limited to, a gated recurrent unit (GRU) recurrent neural network or a long short-term memory (LSTM) recurrent neural network.
  • RNN recurrent neural network
  • GRU gated recurrent unit
  • LSTM long short-term memory
  • the data-driven model may be trained based on the training data set.
  • Training the material model may include parametrizing the material model.
  • the term training may also be denoted as learning.
  • the term specifically may refer to a process of building the classification model, in particular determining and/or updating parameters of the classification model. Updating parameters of the classification model may also be referred to as retraining. Retraining may be included when referring to training herein.
  • the training data set may include at least one image and material information.
  • extracting material data from the image with a data-driven model may comprise providing the image to a data-driven model. Additionally or alternatively, extracting material data from the image with a data-driven model may comprise may comprise generating an embedding associated with the image based on the data-driven model.
  • An embedding may refer to a lower dimensional representation associated with the image such as a feature vector. Feature vector may be suitable for suppressing the background while maintaining the material signature indicating the material data.
  • background may refer to information independent of the material signature and/or the material data. Further, background may refer to information related to biometric features such as facial features.
  • Material data may be determined with the data-driven model based on the embedding associated with the image.
  • extracting material data from the image by providing the image to a data-driven model may comprise transforming the image into material data, in particular a material feature vector indicating the material data.
  • material data may comprise further the material feature vector and/or material feature vector may be used for determining material data.
  • authentication process may be validated based on the extracted material data.
  • the validating based on the extracted material data may comprise determining if the extracted material data corresponds a desired material data. Determining if extracted material data matches the desired material data may be referred to as validating. Allowing or declining the user and/or object to perform at least one operation on the device that requires authentication based on the material data may comprise validating the authentication or authentication process. Validating may be based on material data and/or image. Determining if the extracted material data corresponds a desired material data may comprise determining a similarity of the extracted material data and the desired material data. Determining a similarity of the extracted material data and the desired material data may comprise comparing the extracted material data with the desired material data. Desired material data may refer to predetermined material data.
  • desired material data may be skin. It may be determined if material data may correspond to the desired material data.
  • skin as desired material data may be compared with non-skin material or silicon as material data and the result may be declination since silicon or non-skin material may be different from skin.
  • the authentication process or its validation may include generating at least one feature vector from the material data and matching the material feature vector with associate reference template vector for material.
  • the authentication unit may be configured for authenticating the user in case the user can be identified and/or if the material data matches the desired material data.
  • the device may comprise at least one authorization unit configured for allowing the user to perform at least one operation on the device, e.g. unlocking the device, in case of successful authentication of the user or declining the user to perform at least one operation on the device in case of non-suc- cessful authentication. Thereby, the user may become aware of the result of the authentication.
  • Biometric recognition typically involves at least one data-driven model which receives recorded biometric data, for example an image, as input and output recognition information, for example a classifier indicating if the biometric recognition subject was recognized or not or a classifier indicating if the image shows a real human or a spoofing mask.
  • a data-driven model requires training datasets allowing adjustment of its parameters such that the data-driven model yields output of sufficient accuracy.
  • the training dataset may comprise an image.
  • the image may be a flood image or a pattern image as described above.
  • the image may comprise the whole biometric recognition subject or parts of the biometric recognition subject, for example the face, the eye or the hand.
  • the image may comprise one or several pattern features, for example a central pattern feature and its direct neighbors.
  • the training dataset may further comprise an indicator indicating if the image shows a real user, i.e. a human being, or a spoofing object, for example a silicone mask.
  • the training datasets may originate from real measurements or they may be generated by the method of the present invention. It is also possible that a training dataset comprises both real measurement datasets and datasets generated by the method of the present invention.
  • the method of the present invention comprises receiving context information or context data.
  • the term “receiving” may refer to reading the context information from a file, a database or from an input, for example through a user interface.
  • Context information may mean all information associated with biometric capture device, the biometric capture environment or the biometric recognition subject.
  • the context information may represent physical properties of the biometric capture device, the biometric capture environment or the biometric recognition subject.
  • Context information associated with the biometric capture device may comprise information about hardware components, in particular those hardware components which may influence the process of biometric recognition. In particular, it may include information about hardware components which are employed to retrieve data from a user which serve to authenticate the user.
  • Information about hardware components may comprise hardware specifications such as specifications associate with the optics of the biometric capture device, in particular specifications associated with the diffractive or refractive properties of the biometric capture device, i.e. with properties of the biometric device which are related to how light is diffracted or refracted in the biometric device.
  • the biometric capture device may include a light projector, a camera or any component placed in the light ray path, for example optical components like an aperture, a lens, a shutter or a display in case the camera is placed behind the display.
  • context information may be associated with diffractive or refractive properties of the camera, the lens, the shutter or the display.
  • Context information associated with the biometric capture device may hence comprise information about the type of light projector, for example a laser array, such as a VSCEL array, a laser with a diffractive optical element (DOE), or an LED array; the light projector settings, for example its illumination power, the wavelength of emitted light, the coherence lengths of the emitted light or the divergence of the emitted light; or the characteristics of the illumination, for example information related to a projected pattern like shape, size or density of the pattern features.
  • a laser array such as a VSCEL array, a laser with a diffractive optical element (DOE), or an LED array
  • DOE diffractive optical element
  • LED array for example its illumination power, the wavelength of emitted light, the coherence lengths of the emitted light or the divergence of the emitted light
  • the characteristics of the illumination for example information related to a projected pattern like shape, size or density of the pattern features.
  • Context information associated with the biometric capture device may further comprise information about the type of the camera, for example its producer or its name, its characteristics such as resolution, radiant sensitive area, spectral range of sensitivity, spectral sensitivity at certain wavelengths, half angle, dark current, open circuit voltage, short-circuit current, rise time, fall time, forward voltage, capacitance, temperature coefficient, noise equivalent power, detection limit or thermal resistance junction ambient real.
  • Context information associated with the biometric capture device may further comprise information about the type of display, for example its producer or its name, its characteristics such as pixel size, pitch size of pixels, transmission rate at certain wavelengths, diffraction loss, power threshold the display can withstand.
  • Context information associated with the biometric capture environment may comprise all surrounding factors which may influence the process of capturing sensor data required for biometric recognition.
  • Context information associated with the biometric capture environment may comprise information about the distance of a biometric recognition subject to be authenticated to the recording hardware, the angle under which a biometric recognition subject to be authenticated is recorded, the surrounding temperature, the light conditions, for example the intensity of the background light.
  • Context information associated with the biometric recognition subject may comprise information about the subject to be recognized.
  • the biometric recognition subject may be a user, for example a human, to be recognized, or it may be thing, for example a spoofing mask.
  • Context information associated with the biometric recognition subject may comprise material information about the subject, for example skin, wood, paper or silicon; surface information such as skin tone, type and/or degree of make-up, albedo, reflectance; or personal information, for example the age, size, gender, or ethnic origin of a person.
  • Context information may comprise categorical, such as ordinal or nominal, or numeric variables.
  • numeric variables in particular metric variables reflecting physical properties for example light intensity or focal length of a lens, are preferred. Such variables allow interpolation for values for which no training data is available.
  • the method of the present invention comprises generating a dataset by providing the context information to a generative model.
  • the generative model is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information.
  • the generative model may be parametrized according to the context information.
  • the generative model may comprise multiple statistically dependent conditional submodels or functions, where each submodel or function is specifically conditioned for particular context information. Combinations of these approaches are conceivable.
  • the generative model may be a data-driven model.
  • the generative model may be an artificial neural network.
  • the generative model may be a probabilistic model.
  • the generative model may be a flow-based generative model, for example normalizing flows, a variational autoencoder (VAE) or a generative adversarial network (GAN). It is also possible to use variations or combinations of such model, for example SurVAE as described by Didrik Nielsen on 34th Conference on Neural Information Processing Systems (NeurlPS 2020) - arXiv:2007.02731v2.
  • VAE variational autoencoder
  • GAN generative adversarial network
  • the generative model may be a trained generative model, i.e. a generative model which is trained with training data.
  • the training data may comprise training datasets comprising biometric capture data, for example an image or a sound file.
  • the training datasets may further comprise annotations representing context information.
  • the context information may serve as condition for the biometric data such that the model is able to identify correlations between the context information and features of the biometric data.
  • the generative model may be a conditional model, wherein the context information is used as condition for the model.
  • the training may involve adjusting model parameters with the goal to minimize a loss function. For example, for probabilistic models the Kullback— Leibler divergence between the model's distribution and the empirical distribution may be minimized to maximize the model's likelihood. In case of GANs, the loss function may be regulated minimax loss which accounts for the convergence behavior. Convergence may be improved by starting the training with low resolution images and gradually increase resolution or by the two time-scale update rule.
  • additional training data may be become available, for example if new images are taken with a new camera.
  • the additional training data may be associated with context information not yet comprised in the training datasets the generative model was previously trained with.
  • the additional training data may be added to the previous training data by encoding the source of the data, for example a dataset ID, within the context.
  • the generative model may be regarded as a conditional model with respect to the underlying training datasets.
  • this method is advantageous for conditional models.
  • the conditioning of models on different training datasets may also inject an implicit transfer-function which allows to transfer biometric capture data from a source system to a target system. A biometric capture from the training dataset of the source system may be transferred to the target system in three steps.
  • a context-free representation of the biometric capture may be computed by using the context information of the source system.
  • the context information may be modified to represent the target-system and in the last step the modified context information may be used to compute a biometric capture in the target system.
  • biometric captures In many cases, however, it is very difficult to stabilize the conditions of the biometric captures, for example due to non-availability or aging of the biometric subjects, non-availability of certain make-ups or the controllability of certain parameters like skin properties. In these cases it may be more reasonable not to try to transfer individual biometric captures from one system to another, but to transfer the model in function space and then generate random biometric captures of the target system with given context.
  • the trained generative model may be fine-tuned only with the additional training data. Fine-tuning may comprise limiting changes of parameters of the generative model to mitigate or avoid catastrophic interference or catastrophic forgetting.
  • the trained generative model may receive context information and output multiple different datasets for training a biometric recognition system. Variations of the output may arise from randomizing an internal state of the model.
  • the model may comprise a random vector containing values from the training procedure.
  • the random vector may be of higher dimensionality than the biometric capture used as input during training of the model.
  • the random vector may be varied by adding or multiplying small random values.
  • the variation may correspond to the probability distribution of corresponding features in the training data.
  • the generative model may output multiple different datasets corresponding to a probability distribution in the training data the generative model was trained with.
  • the method of the present invention comprises outputting the dataset obtained from the model.
  • Outputting can mean writing the dataset on a non-transitory data storage medium, for example into a file or database, display it on a user interface, for example a screen, or both. It is also possible to output the dataset through an interface to a cloud system for storage and/or further processing.
  • the present invention further relates to a non-transient computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to perform the method according to the present invention.
  • computer-readable data medium may refer to any suitable data storage device or computer readable memory on which is stored one or more sets of instructions (for example software) embodying any one or more of the methodologies or functions described herein.
  • the instructions may also reside, completely or at least partially, within the main memory and/or within the processor during execution thereof by the computer, main memory, and processing device, which may constitute computer-readable storage media.
  • the instructions may further be transmitted or received over a network via a network interface device.
  • Computer-readable data medium include hard drives, for example on a server, USB storage device, CD, DVD or Blue-ray discs.
  • the computer program may comprise all functionalities and data required for execution of the method according to the present invention or it may provide interfaces to have parts of the method processed on remote systems, for example on a cloud system.
  • the invention further relates to a method for granting a user access to a device or application.
  • a device can be a mobile device, for example smartphone, a tablet computer, a laptop computer or a smartwatch, or it can be a stationary device such as a payment terminal or an access control system, for example to control access to a building, a subway train station, an airport gate, a production facility, a car rental site, an amusement park, a cinema, or a supermarket for registered customers.
  • the access control system may further be integrated into a vehicle, for example a car, a train, an airplane, or a ship.
  • An application may refer to a local program, for example installed on a smartphone or a laptop, or a remote service, for example a service on a cloud system to be accessed via internet.
  • the application may serve several purposes, for example to authorize a payment, identify the user for a transaction with the public administration, for example to renew a driver's license, or authorize the user for high-security communication.
  • FIG. 1 illustrates an embodiment of the present invention.
  • Figure 2 illustrates an example of a generative model.
  • Figure 3 illustrates another example of a generative model.
  • Figure 4 illustrates an example for a biometric recognition system.
  • Figure 5 illustrates an exemplary use of the invention.
  • Figure 6 illustrates an example for a biometric recognition.
  • a biometric recognition system 110 may receive biometric sensor data 112. Such biometric sensor data 112 may be a pattern image recorded from a user under illumination of patterned infrared light. Alternatively or additionally, biometric sensor data 112 may be an audio record of the voice of a user. The biometric recognition system 110 may analyze biometric sensor data 112 including for example the comparison to reference data 111. Based on the result, the biometric recognition system 110 may generate an authentication output 120. The authentication output 120 may be a signal indicating if an authorized user has been detected.
  • the biometric recognition system 110 may comprise a data-driven model.
  • Such model may be or comprise a material model which receives images showing a user or parts of a user under illumination of a structured light pattern and classifies the images into those showing real skin and those showing other materials, for example a silicon spoofing mask.
  • the data driven model of the biometric recognition system 110 may be a trained model, which is trained with training data.
  • the training data may comprise original training data 101, i.e. data originating from biometric sensors and potentially being labelled manually, and generated training data 104.
  • Generated training data may be training data which has been generated by a generative model 102.
  • the generative model 102 in turn may be a data-driven model, for example a flow-based generative model.
  • the generative model 102 may be trained with the original training data 101 as well as context information 103.
  • Context information 103 may be related to the biometric capture device of the biometric recognition system 110, for example the lens characteristics of the camera or refractive properties of a transparent display.
  • FIG. 2 illustrates an example of a generative model 210.
  • a biometric capture for example an image 201, may be recorded by a biometric capture device, for example a camera or a touch-sensitive screen.
  • the image 201 or parts thereof may be used as input to a function f of generative model 210.
  • Function f is configured to transform the image 201 into a random vector 203.
  • the image 201 may be first transformed into a feature vector which may be used as input.
  • the random vector 203 may provide a context-free representation of the image 201 in terms of independent identical distributed variables.
  • the random vector 203 may be designed to have a higher dimensionality as the image 201.
  • function f may be parametrized according to the context vector.
  • the context vector 202 may comprise elements related to the hardware of the biometric recognition system and/or the biometric capture environment.
  • Function f may be parametrized such that random vector 203 depends both on image 201 and context vector 202.
  • Function f may be trained involving approximating the probability distribution p(x
  • y) pZ (f(x;y))'
  • the variable x refers to the elements of the feature vector 202.
  • the variable y refers to the elements of the context vector 204.
  • the generative model 210 may generate various feature vectors 202 and images 201 by varying the random vector 203 according to its probability distribution and choosing the context vector as appropriate, i.e. according to the biometric recognition system to be used. Such generated feature vectors or images may be used to train the model of a biometric recognition system, either alone or in combination with real biometric sensor data.
  • Figure 3 illustrates another example of a generative model 310.
  • the model is similar to the generative model described for figure 2.
  • the image 301 may show a face of a user under illumination with structured light, for example a hexagonal point pattern.
  • the image 301 may be cropped into multiple patches 302, wherein a patch may show a subset of the patterns in image 301 . For example, one patch may have a spot in the center and the neighboring spots around it.
  • Each patch 302 may be used to train a function f of the generative model.
  • the training involves a context vector 304 comprising information related to the hardware and the environment under which image 301 was recorded.
  • the generative model 310 outputs a random vector 308.
  • the inverse function T 1 may be calculated which may be used to generate patches 302 depending on the choice of a context vector 304.
  • FIG. 4 illustrates an example for a biometric recognition system 400.
  • the biometric recognition system 400 may be integrated into a portable device, for example a smartphone, a tablet computer, a laptop computer or a smartwatch. It may comprise a projector 401 which projects light 411 onto a user 410.
  • the light may be infrared light, for example with a wavelength of 850 nm, which is invisible to the user 410.
  • the projected light 411 may be floodlight or patterned light, for example a hexagonal point pattern.
  • the projected light 411 may impinge on the face of the user 410, but it may also impinge on the whole head including hair, the upper part of the body including head neck and shoulders or even the complete body.
  • the reflected light 412 may be recorded by a camera 402 which thereby captures an image of the user 410 illuminated by the projected light.
  • the camera 402 may generate an image in the optical range matching the wavelength emitted by projector 401, for example in the infrared range.
  • the image may be a grayscale image, i.e. each pixel comprises only the total intensity information, or an RGB image, i.e. different pixels indicate the intensity in a particular wavelength.
  • the image may be passed to processor 403.
  • the processor 403 may be a microcontroller, i.e. comprising memory and IO controller functionalities, or it may be a CPU which is connected to memory and IO controllers.
  • the processor 403 may execute program code which determines if the user 410 is an authorized user 410.
  • Such determination may involve vectorizing the image into features. Such feature vector may be compared to a stored template. If the difference between the feature vector and the stored template is below a predefined threshold, the processor may determine that the user 410 in the vehicle is authorized. The processor 403 may further determine if the image really shows a human rather than a spoofing mask. This may be accomplished by classifying the material of the face in the image by evaluating reflection characteristics in the reflected light. If no skin is detected, the processor may determine that the user 410 in front of the transparent display is not authorized. The processor 403 may be communicatively coupled to memory 404.
  • the memory 404 may be transient memory, for example random access memory (RAM), or persistent memory, for example flash memory.
  • the memory 404 may comprise program code configured to determines if the user 410 is an authorized person as well as templates for registered authorized users.
  • the processor 403 may generate a signal indicating that the user 410 is authorized.
  • the signal may be forwarded to an access control 405 for unlocking the device, granting access to an application, or effecting a secure payment, for example via a wireless communication interface.
  • the biometric recognition system 400 may comprise a display 406.
  • the display 406 may be transparent for the projected light 411 and the reflected light 412, such that the projector 401 and the camera 402 may be placed behind the display 406.
  • the display 406 may only transparent at the positions at which the projected light 411 and the reflected light 412 passes the display 406. Transparent may mean that at least 30 % or at least 50 % of the incident light passes through the transparent display 406.
  • a biometric recognition system 500a which may be a smartphone, may have a camera and/or a projector behind a display 501a.
  • the biometric recognition system 400a may comprise a biometric recognition model to authenticate a user which is trained with images obtained through the display 501a.
  • the display 501a may be replaced by a different display 501b in an exchange step 502. This may be part of a version update of the biometric recognition system 500a to a biometric recognition system 500b.
  • a generative model 520 may be provided with context information associated with the display 501b comprising, for example, pixel size, pixel density, pitch size and transparence of display 501b.
  • the generative model 520 may generate training data 530, for example image patches of a face, which is adjusted to the display 501b without the need to newly record images of test persons with the display 501b.
  • the generated training data 530 may be used to train (503) the system for biometric recognition, whereupon the biometric recognition system 500b is ready for use within a short period of time.
  • the biometric authentication system may be a face authentication system which verifies if a face in front of a camera is really the claimed person, so neither a different person nor a spoofing mask.
  • the biometric authentication system may be integrated into a portable device such as a smartphone, or in an access system, for example a door opening system of a building or a vehicle.
  • the face of the person may be illuminated with patterned illumination 601 .
  • a camera may record a pattern image 602 of the face under patterned illumination.
  • the pattern image 602 may be used to authenticate the material 603 of the face as described for figure 3 and 4.
  • the material authentication may yield a similarity score. If the similarity score is below a threshold, the authentication may be rejected 630.
  • the face may be illuminated with flood light 611, for example shortly before or after illuminating the face with patterned light 601 .
  • a camera may record a flood image 612 of the face under flood illumination 611.
  • the flood image 612 may be used to recognize the identity of the person, for example by extracting features of the flood image and compare them with a reference database. If recognition yields an incorrect person, for example a person without access rights, the authentication may rejected 630. If both the correct person is identified and the material authentication yields a similarity score above the threshold, the person may be authenticated 620.
  • Providing in the scope of this disclosure may include any interface configured to provide data. This may include an application programming interface, a human-machine interface such as a display and/or a software module interface. Providing may include communication of data or sub-mission of data to the interface, in particular display to a user or use of the data by the receiving node, entity or interface.
  • Various units, circuits, entities, nodes or other computing components may be described as “configured to” perform a task or tasks. Configured to shall recite structure meaning “having circuitry that” performs the task or tasks on operation. The units, circuits, entities, nodes or other computing components can be configured to perform the task even when the unit/circuit/component is not operating. The units, circuits, entities, nodes or other computing components that form the structure corresponding to "configured to” may include hardware circuits and/or memory storing program instructions executable to implement the operation. The units, circuits, entities, nodes or other computing components may be described as performing a task or tasks, for convenience in the description. Such descriptions shall be interpreted as including the phrase "configured to.” Any recitation of "configured to” is expressly intended not to invoke 35 U.S.C. ⁇ 112(f) interpretation.
  • the methods, apparatuses, systems, computer elements, nodes or other computing components described herein may include memory, software components and hardware components.
  • the memory can include volatile memory such as static or dynamic random-access memory and/or nonvolatile memory such as optical or magnetic disk storage, flash memory, programmable read-only memories, etc.
  • the hardware components may include any combination of combinatorial logic circuitry, clocked storage devices such as flops, registers, latches, etc., finite state machines, memory such as static random-access memory or embedded dynamic random-access memory, custom designed circuitry, programmable logic arrays, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention is in the area of generation of datasets for biometric recognition. It relates to a computer-implemented method for generating datasets for training a biometric recognition system comprising a. receiving context information associated with hardware components of the biometric capture device, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is configured to receive context information as input and output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.

Description

Biometric Recognition Dataset Generation
The invention is in the area of generation of datasets for biometric recognition. The invention relates to a method for generating datasets for training a biometric recognition system, the use of the datasets for training a biometric recognition system, a biometric recognition system trained with datasets, a system for generating datasets for training a biometric recognition system and a non-transient computer-readable medium including instructions for generating datasets for training a biometric recognition system.
Background
Modern biometric recognition systems typically involve models, for example artificial neural networks. These models are trained to differentiate between an authorized user and an unauthorized user or a spoofing mask. For example, US 2020/0104570 discloses an optical face recognition system in which an image from a user is evaluated including a trained neural network.
Shahreza Hatef Otroshi et al, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45 (2023), pages 14248-14265 disclose reconstruction ef face images at different angles taking into account camera parameters corresponding to the camera rotation (page 14255, left column, last paragraph). Such images can be used for face recognition. However, the obtained data can hardly be used in face recognition systems using different hardware components.
To achieve high reliability of biometric recognition systems the model training requires many training datasets. The datasets must involve many different biometric recognition subjects each recorded at various environmental conditions, for example different distances, angles, temperatures, or light conditions. In addition, the biometric capture device of the recognition system plays a role, for example for an optical scanner system, the illumination, the camera and potentially a display through which the light is transmitted must be taken into account. Recording many biometric recognition subjects under different environmental conditions with all conceivable biometric capture devices is practically impossible.
Summary
In one aspect the present invention relates to a computer-implemented method for generating datasets for training a biometric recognition system comprising a. receiving context information associated with hardware components of the biometric capture device, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
In another aspect the present invention relates to a computer-implemented method for generating datasets for training a biometric recognition system comprising a. receiving context information associated with the biometric capture device, the biometric capture environment or the biometric recognition subject, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
In another aspect the present invention relates to a use of the datasets obtained by the method according to the invention for training a data-driven model of a biometric recognition system.
In another aspect the present invention relates to a use of the datasets obtained by the method according to the invention for training a biometric recognition system.
In another aspect the present invention relates to a method of access control to a device or application comprising a. receiving a request for accessing the device or application, b. in response to the request executing a biometric recognition with a system trained with datasets obtained by the method according to the present invention, c. granting access to the device or application depending on the outcome of the biometric recognition.
In another aspect the present invention relates to a biometric recognition system comprising a data-driven model trained with datasets obtained by the method according to the present invention.
In another aspect the present invention relates to a biometric recognition system trained with datasets obtained by the method according to the present invention.
In another aspect the present invention relates to a system for generating datasets for training a biometric recognition system comprising a. an input for receiving context information associated with hardware components of the biometric capture device, b. a processor for generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. an output for outputting the dataset.
In another aspect the present invention relates to a system for generating datasets for training a biometric recognition system comprising a. an input for receiving context information associated with the biometric capture device, the biometric capture environment or the biometric recognition subject, b. a processor for generating a dataset for training a biometric recognition system by providing the context information to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. an output for outputting the dataset.
In another aspect the present invention relates to a non-transient computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to perform a method comprising a. receiving context information associated with hardware components of the biometric capture device, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
In another aspect the present invention relates to a non-transient computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to perform a method comprising a. receiving context information associated with the biometric capture device, the biometric capture environment or the biometric recognition subject, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
In another aspect the present invention relates to a generative model which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information.
The generated datasets enable training of a biometric recognition system with higher accuracy. A biometric recognition system may be implemented by multimodal approaches that involve separate models, specialized on different biometric features. These models may be trained to collaboratively ensure the identity of a biometric recognition subject and protect it against spoofing-attacks. To achieve high reliability of such multimodal biometric recognition systems the training of the specialized models requires comprehensive datasets which sufficiently represent the respectively assessed biometric features within the considered target population. Further, the biometric recognition system is more robust against environmental conditions, for example different distances, angles, temperatures, or light conditions. The demand for recording test persons under various conditions is drastically reduced, so the biometric recognition system is more rapidly ready for use. In addition, when exchanging hardware of the biometric recognition system, for example a projector or camera optics, no new recording of test persons is required, enabling a quick adoption of new hardware components. Furthermore, the biometric recognition system can produce more reliable results for biometric recognition subjects with an unusual appearance, for example stains, scares or other deformations due to an accident or a disease. Such rare appearances can hardly be taken into account from biometric captures of real humans as it is usually too difficult to acquire enough representative test persons.
Biometric recognition may refer to any procedure which uses a characteristic of a human to identify the user. Biometric recognition may comprise identity recognition, i.e. is the correct person in front of the biometric capture device, and authentication, i.e. is a real person in front of the biometric capture device and not a spoofing object which looks identical to the correct person. Biometric recognition may include optical biometric recognition like face recognition, iris scan, palm scan or fingerprint scan; or acoustic recognition like voice recognition. Optical biometric recognition may be passive, i.e. an image is recorded of the user or part of the user to be recognized, wherein the user is only under irradiation of ambient light. Optical biometric recognition may be active, i.e. an image is recorded of the user or part of the user to be recognized, wherein the user is only under irradiation of light emitted by a projector.
The term "light” may refer to electromagnetic radiation in one or more of the infrared, the visible and the ultraviolet spectral range. Herein, the term "ultraviolet spectral range”, generally, refers to electromagnetic radiation having a wavelength of 1 nm to 380 nm, preferably of 100 nm to 380 nm. Further, in partial accordance with standard ISO- 21348 in a valid version at the date of this document, the term "visible spectral range”, generally, refers to a spectral range of 380 nm to 760 nm. The term "infrared spectral range” (IR) generally refers to electromagnetic radiation of 760 nm to 1000 pm, wherein the range of 760 nm to 1.5 pm is usually denominated as "near infrared spectral range” (NIR) while the range from 1.5 p to 15 pm is denoted as "mid infrared spectral range” (MidlR) and the range from 15 pm to 1000 pm as "far infrared spectral range” (FIR). Preferably, light used for the typical purposes of the present invention is light in the infrared (IR) spectral range, more preferred, in the near infrared (NIR) and/or the mid infrared spectral range (MidlR), especially the light having a wavelength of 1 pm to 5 pm, preferably of 1 pm to 3 pm.
An optical biometric recognition system may comprise a projector. The term "projector” may refer to a device configured for generating or providing light in the sense of the above-mentioned definition. The projector may be a pattern projector, a floodlight projector or both either simultaneously or the projector may repeatedly switch from illuminating patterned light to floodlight. The term "pattern projector” may refer to a device configured for generating or providing at least one light pattern, in particular at least one infrared light pattern. The term "light pattern” may refer to at least one pattern comprising a plurality of light spots. The light spot may be at least partially spatially extended. At least one spot or any spot may have an arbitrary shape. In some cases, a circular shape of at least one spot or any spot may be preferred. The spots may be arranged by considering a structure of a display comprised by a device that is further comprising the optoelectronic apparatus. Typically, an arrangement of an OLED-pixel-structure of the display may be considered. The term "infrared light pattern” may refer to a light pattern comprising spots in the infrared spectral range. The infrared light pattern may be a near infrared light pattern. The infrared light may be coherent. The infrared light pattern may be a coherent infrared light pattern.
The pattern projector may be configured for emitting monochromatic light, e.g. in the near infrared region. The term "monochromatic” may refer to light with a wavelength accuracy of less or equal to ± 2 % or less or equal to ± 1 %. The wavelength accuracy may be the maximum difference of emitted wavelength relative to the mean wavelength. In other embodiments, the pattern projector may be adapted to emit light with a plurality of wavelengths, e.g. for allowing additional measurements in other wavelengths channels.
The infrared light pattern may comprise at least one regular and/or constant and/or periodic pattern such as a triangular pattern, a rectangular pattern, a hexagonal pattern or a pattern comprising further convex tilings. For example, the infrared light pattern is a hexagonal pattern, preferably a hexagonal infrared light pattern, preferably a 2/5 hexagonal infrared light pattern.
Using a periodical 2/5 hexagonal pattern can allow distinguishing between artefacts and usable signal.
The light pattern may comprise less than 4000 spots, for example less than 3000 spots or less than 2000 spots or less than 1500 spots or less than 1000 spots. The light pattern may comprise patterned coherent infrared light of less than 4000 spots or less than 3000 spots or less than 2000 spots or less than 1500 spots or less than 1000 spots.
At least one of the infrared light spots may be associated with a beam divergence of 0.2° to 0.5°, preferably 0.1 ° to 0.3°. The term "beam divergence” may refer to at least one measure of an increase in at least one diameter and/or at least one diameter equivalent, such as a radius, with a distance from an optical aperture from which the beam emerges. The measure may be an angle or an angle equivalent. In the context of the present invention, typically, a beam divergence may be determined at 1/e2.
The pattern projector may comprise at least one pattern projector configured for generating the infrared light pattern. The pattern projector may comprise at least one emitter, in particular a plurality of emitters. The term "emitter” may refer to at least one arbitrary device configured for providing at least one light beam. The light beam may generate the infrared light pattern. The emitter may comprise at least one element selected from the group consisting of at least one laser source such as at least one semi-conductor laser, at least one double heterostructure laser, at least one external cavity laser, at least one separate confinement heterostructure laser, at least one quantum cascade laser, at least one distributed Bragg reflector laser, at least one polariton laser, at least one hybrid silicon laser, at least one extended cavity diode laser, at least one quantum dot laser, at least one volume Bragg grating laser, at least one Indium Arsenide laser, at least one Gallium Arsenide laser, at least one transistor laser, at least 50 one diode pumped laser, at least one distributed feedback lasers, at least one quantum well laser, at least one interband cascade laser, at least one semiconductor ring laser, at least one vertical cavity surface emitting laser (VCSEL); at least one non-laser light source such as at least one LED or at least one light bulb. For example, the pattern projector comprises at least one least one VCSEL, preferably a plurality of VCSELs. The plurality of VCSELs may be arranged in at least one array, e.g. comprising a matrix of VCSELs. The VCSELs may be arranged on the same substrate, or on different substrates. The term "vertical-cavity surface-emitting laser” may refer to a semiconductor laser diode configured for laser beam emission perpendicular with respect to a top surface. Examples for VCSELs can be found e.g. in en.wikipedia.org/wikiA/erticalcavity_surface-emitting_laser. VCSELs are generally known to the skilled user such as from WO 2017/222618 A. Each of the VCSELs is configured for generating at least one light beam. The plurality of generated spots may be associated with the infrared light pattern. The VCSELs may be configured for emitting light beams at a wavelength range from 800 to 1000 nm. For example, the VCSELs may be configured for emitting light beams at 808 nm, 850 nm, 940 nm, and/or 980 nm. Preferably the VCSELs emit light 940 nm, since terrestrial sun radiation has a local minimum in irradiance at this wavelength, e.g. as described in CIE 085-1989 „Solar spectral Irradiance”.
The pattern projector may comprise at least one optical element configured for increasing, e.g. duplicating, the number of spots generated by the pattern projector. The pattern projector, particularly the optical element, may comprises at least one diffractive optical element (DOE) and/or at least one meta surface element. The DOE and/or the meta surface element may be configured for generating multiple light beams from a single incoming light beam. Further arrangements, particularly comprising a different number of projecting VCSEL and/or at least one different optical element configured for increasing the number of spots may be possible. Other multiplication factors are possible. For example, a VCSEL or a plurality of VCSELs may be used and the generated laser spots may be duplicated by using at least one DOE.
The pattern projector may comprise at least one transfer device. The term "transfer device”, also denoted as "transfer system” may refer to one or more optical elements which are adapted to modify the light beam, particularly the light beam used for generating at least a portion of the infrared light pattern, such as by modifying one or more of a beam parameter of the light beam, a width of the light beam or a direction of the light beam. The transfer device may comprise at least one imaging optical device .The transfer device specifically may comprise one or more of: at least one lens, for example at least one lens selected from the group consisting of at least one focus-tunable lens, at least one aspheric lens, at least one spherical lens, at least one Fresnel lens; at least one diffractive optical element; at least one concave mirror; at least one beam deflection element, preferably at least one mirror; at least one beam splitting element, preferably at least one of a beam splitting cube or a beam splitting mirror; at least one multi lens system; at least one holographic optical element; at least one meta optical element. Specifically, the transfer device comprises at least one refractive optical lens stack. Thus, the transfer device may comprise a multi-lens system having refractive properties.
The pattern projector may be configured for emitting modulated or non-modulated light. In case a plurality of emitters is used, the different emitters may have different modulation frequencies, e.g. which can be used for distinguishing the light beams.
The light beam or light beams generated by the pattern projector may propagate parallel to an optical axis. The pattern projector may comprise at least one reflective element, preferably at least one prism, for deflecting the illuminating light beam onto the optical axis. As an example, the light beam or light beams, such as the laser light beam, and the optical axis may include an angle of less than 10°, preferably less than 5° or even less than 2°. Other embodiments, however, are feasible. Further, the light beam or light beams may be on the optical axis or off the optical axis. As an example, the light beam or light beams may be parallel to the optical axis having a distance of less 10 than 10 mm to the optical axis, preferably less than 5 mm to the optical axis or even less than 1 mm to the optical axis or may even coincide with the optical axis.
The term "flood projector” may refer to at least one device configured for providing substantially continuous spatial illumination. The flood projector may illuminate a measurement area, such as a user, a portion of the user and/or a face of the user, with a spatially constant or essentially constant illumination intensity. The term "flood light” may refer to substantially continuous spatial illumination, in particular diffuse and/or uniform illumination. The flood light has a wavelength in the infrared range, in particular in the near infrared range. The flood projector may comprise at least one least one VCSEL, preferably a plurality of VCSELs, for example an array of VCSELs. The term "substantially continuous spatial illumination” may refer to uniform spatial illumination, wherein areas of non-uniform are possible.
A relative distance between the flood projector and the pattern projector may be below 3.0 mm. The relative distance between the flood projector and the pattern projector may be below 2.5 mm, preferably below 2.0 mm. The pattern projector and the flood projector may be combined into one module. For example, the pattern projector and the flood projector may be arranged on the same substrate, in particular having a minimum relative distance. The minimum relative distance may be defined by a physical extension of the flood projector and the pattern projector. Arranging the pattern projector and the flood projector having a relative distance below 3.0 mm can result in decreased space requirement of the two projectors. In particular, said projectors can even be combined into one module. Such a reduced space requirement can allow reducing the transparent area(s) in a display necessary for operation of the projectors) behind the display. In an embodiment, the pattern projector and the flood projector may comprise at least one VCSEL, preferably a plurality of VCSELs, for example an array of VCSELs. The pattern projector may comprise a plurality of first VCSELs mounted on a first platform. The flood projector may comprise a plurality of second VCSELs mounted on a second platform. The second platform may be beside the first platform. The optoelectronic apparatus may comprise a heat sink. Above the heat sink a first increment comprising the first platform may be attached. Above the heat sink a second increment comprising the second platform may be attached. The second increment may be different from the first increment. Thus, the first platform may be more distant to the optical element configured for increasing, e.g. duplicating, the number of spots. The second platform may be closer to the optical element. The beam emitted from the second VCSEL may be defocused and thus, form overlapping spots. This leads to a substantially continuous illumination and, thus, to flood illumination.
The projector may be positioned such that it can illuminate light through the transparent display. Hence, light emitted by the projector may cross the transparent display before it impinges on the user. From the user's view, the projector may be placed behind the transparent display.
An optical biometric recognition system may comprise a camera. The term "camera” may refer to at least one unit of the optoelectronic apparatus configured for generating at least one image. The image may be generated via a hardware and/or a software interface, which may be considered as the camera. The term "image generation” may refer to capturing and/or generating and/or determining and/or recording at least one image by using the camera. The image generation may comprise imaging and/or recording the image. The image generation may comprise capturing a single image and/or a plurality of images such as a sequence of images. For generating an image via a hardware and/or a software interface, the capturing and/or generating and/or determining and/or recording of the image may be caused and/or initiated by the hardware and/or the software interface. For example, the image generation may comprise recording continuously a sequence of images such as a video or a movie. The image generation may be initiated by a user action or may automatically be initiated, e.g. once the presence of at least one object or user within a field of view and/or within a predetermined sector of the field of view of the camera is automatically detected.
The camera may comprise at least one optical sensor, in particular at least one pixelated optical sensor. The camera may comprise at least one CMOS sensor or at least one CCD chip. For example, the camera may comprise at least one CMOS sensor, which may be sensitive in the infrared spectral range. The term "image” may refer to data recorded by using the optical sensor, such as a plurality of electronic readings from the CMOS or CCD chip. The image may comprise raw image data or may be a pre-processed image. For example, the pre-processing may comprise applying at least one filter to the raw image data and/or at least one background correction and/or at least one background subtraction.
For example, the camera may comprise a color camera, e.g. comprising at least color pixels. The camera may comprise a color CMOS camera. For example, the camera may comprise black and white pixels and color pixels. The color pixels and the black and white pixels may be combined internally in the camera. The camera may comprise a color camera (e.g. RGB) or a black and white camera, such as a black and white CMOS. The camera may comprise a black and white CMOS chip. The camera generally may comprise a one-dimensional or two-dimensional array of image sensors, such as pixels.
The color camera may be an internal and/or external camera of a device comprising the optoelectronic apparatus. The internal and/or external camera of the device may be accessed via a hardware and/or a software interface comprised by the optoelectronic apparatus, which is used as the camera. In case, the device is or comprises a smartphone the image generating unit may be a front camera, such as a selfie camera, and/or back camera of the smartphone.
The camera may have a field of view between 10°x10° and 75°x75°, preferably 55°x65°. The camera may have a resolution below 2 MP, preferably between 0.3 MP and 1.5 MP.
The camera may comprise further elements, such as one or more optical elements, e.g. one or more lenses. As an example, the optical sensor may be a fix-focus camera, having at least one lens which is fixedly adjusted with respect to the camera. Alternatively, however, the camera may also comprise one or more variable lenses which may be adjusted, automatically or manually. Other cameras, however, are feasible.
The term "pattern image” may refer to an image generated by the camera while illuminating the infrared light pattern, e.g. on an object and/or a user. The pattern image may comprise an image showing a user, in particular at least parts of the face of the user, while the user is being illuminated with the infrared light pattern, particularly on a respective area of interest comprised by the image. The pattern image may be generated by imaging and/or recording light reflected by an object and/or user which is illuminated by the infrared light pattern. The pattern image showing the user may comprise at least a portion of the illuminated infrared light pattern on at least a portion the user. For example, the illumination by the pattern illumination source and the imaging by using the optical sensor may be synchronized, e.g. by using at least one control unit of the optoelectronic apparatus.
The term "flood image” may refer to an image generated by the camera while illumination source is illuminating infrared flood light, e.g. on an object and/or a user. The flood image may comprise an image showing a user, in particular the face of the user, while the user is being illuminated with the flood light. The flood image may be generated by imaging and/or recording light reflected by an object and/or user which is illuminated by the flood light. The flood image showing the user may comprise at least a portion of the flood light on at least a portion the user. For example, the illumination by the flood illumination source and the imaging by using the optical sensor may be synchronized, e.g. by using at least one control unit of the optoelectronic apparatus. The camera may be configured for imaging and/or recording the pattern image and the flood image at the same time or at different times. The camera may be configured for imaging and/or recording the pattern image and the flood image at at least partially overlapping measurement areas or equivalents of the measurement areas.
The optical biometric recognition system may comprise a transparent display. The camera or the projector may be placed behind the transparent display in order maximize the display area of a device. The term "display” may refer to an arbitrary shaped device configured for displaying an item of information. The item of information may be arbitrary information such as at least one image, at least one diagram, at least one histogram, at least one graphic, text, numbers, at least one sign, or an operating menu. The display may be or may comprise at least one screen. The display may have an arbitrary shape, e.g. a rectangular shape. The display may be a front display of the device.
The display may be or may comprise at least one organic light-emitting diode (OLED) display. The term "organic light emitting diode” may refer to a light-emitting diode (LED) in which an emissive electroluminescent layer is a film of organic compound configured for emitting light in response to an electric current. The OLED display may be configured for emitting visible light. The display, particularly a display area, may be covered by glass. In particular, the display may comprise at least one glass cover.
The transparent display may be at least partially transparent. The term "at least partially transparent” may refer to a property of the display to allow light, in particular of a certain wavelength range, e.g. in the infrared spectral region, in particular in the near infrared spectral region, to pass at least partially through. For example, the display may be semitransparent in the near infrared region. For example, the display may have a transparency of 20 % to 50 % in the near infrared region. The display may have a different transparency for other wavelength ranges. For example, the display may have a transparency of > 80 % for the visible spectral range, preferably > 90 % for the visible spectral range. The transparent display may be at least partially transparent over the entire display area or only parts thereof. Typically, it is sufficient if only those parts of the display area are at least partially transparent trough which light needs to pass from the projector or to the camera.
The display comprises a display area. The term "display area” may refer to an active area of the display, in particular an area which is activatable. The display may have additional areas such as recesses or cutouts. The display may have a first area associated with a first pixel per inch (PPI) value and a second area associated with a second PPI value. The first PPI value may be lower than the second PPI value, preferably first PPI value is equal to or below 400 PPI, more preferably the second PPI value may be equal to or higher than 300 PPI. The first PPI value may be associated with the at least one continuous area being at least partially transparent.
Optical biometric recognition may comprise identifying the user based on the flood image. The term "identifying” may refer to identity check and/or verifying an identity of the user. The identifying of the user may comprise analyzing the flood image. The analyzing of the flood image may comprise performing a face verification of the imaged face to be the user's face. The identifying the user may comprise matching the flood image, e.g. showing a contour of parts of the user, in particular parts of the user's face, with a template. Determining if the imaged face is the face of the user may comprise identifying the user, in particular determining if the imaged face corresponds to at least one image of the user's face stored in at least one memory, e.g. of the device.
The analyzing may comprise one or more of the following: a filtering; a selection of at least one region of interest; a formation of a difference image between the flood image and at least one offset; an inversion of flood image; a background correction; a decomposition into color channels; a decomposition into hue; saturation; and brightness channels; a frequency decomposition; a singular value decomposition; applying a Canny edge detector; applying a Laplacian of Gaussian filter; applying a Difference of Gaussian filter; applying a Sobel operator; applying a Laplace operator; applying a Scharr operator; applying a Prewitt operator; applying a Roberts operator; applying a Kirsch operator; applying a high-pass filter; applying a low-pass filter; applying a Fourier transformation; applying a Radon-transfor- mation; applying a Hough-transformation; applying a wavelet-transformation; a thresholding; creating a binary image. The region of interest may be determined manually by a user or may be determined automatically, such as by recognizing the user within the image. In particular, the analyzing of the flood image may comprise using at least one image recognition technique, in particular a face recognition technique. An image recognition technique comprises at least one process of identifying the user in an image. The image recognition may comprise using at least one technique selected from the technique consisting of: color-based image recognition, e.g. using features such as hue, saturation, and value (HSV) or red, green, blue (RGB); template matching, for example as illustrated on https://www.mathworks.com/help/vision/ug/pattern-matching.html; image segment and/or blob analysis e.g. using size, color, or shape; machine learning and/or deep learning e.g. using at least one convolutional neural network.
The neural network may be trained by the user, such as in a training procedure, in which the user is indicated to take at least one or a plurality of pictures showing himself.
The analyzing of the flood image may comprise determining a plurality of facial features. The analyzing may comprise comparing, in particular matching, the determined facial features with template features. The template features may be features extracted from at least one template. The template may be or may comprise at least one image generated in an enrollment process, e.g. when initializing the authentication system. Template may be an image of an authorized user. The template features and/or the facial feature may comprise a vector. Matching of the features may comprise determining a distance between the vectors. The identifying of the user may comprise comparing the distance of the vectors to a least one predefined limit, wherein the user is successfully identified in case the distance is smaller than or equal to the predefined limit at least within tolerances. The user declining and/or rejected otherwise.
For example, the image recognition may comprise using at least one model, in particular a trained model comprising at least one face recognition model. The analyzing of the flood image may be performed by using a face recognition system, such as FaceNet, e.g. as described in Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering”, arXiv: 1503.03832. The trained model may comprises at least one convolutional neural network. For example, the convolutional neural network may be designed as described in M. D. Zeller and R. Fergus, "Visualizing and understanding convolutional networks”, CoRR, abs/1311.2901, 2013, or C. Szegedy et al., "Going deeper with convolutions”, CoRR, abs/1409.4842, 2014. For more details with respect to convolutional neural network for the face recognition system reference is made to Florian Schroff, Dmitry Kalen- ichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering”, arXiv: 1503.03832. As training data labelled image data from an image database may be used. Specifically, labeled faces may be used from one or more of G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, "Labeled faces in the wild: A database for studying face recognition in unconstrained environments”, Technical Report 07-49, University of Massachusetts, Amherst, October 2007, the Youtube® Faces Database as described in L. Wolf, T. Hassner, and I. Maoz, "Face recognition in unconstrained videos with matched background similarity”, in IEEE Conf, on CVPR, 2011, or Google® Facial Expression Comparison dataset. The training of the convolutional neural network may be performed as described in Florian Schroff, Dmitry Kalenichenko, James Philbin, "FaceNet: A Unified Embedding for Face Recognition and Clustering”, arXiv: 1503.03832.
Image artifacts caused by diffraction of the light when passing the transparent display may be corrected. The term "correct” may mean partially or fully remove the artifacts or tag them so they can be excluded from further processing, in particular from determine if the imaged user is an authorized user. Correcting image artifacts may take into account the information about the transparent display, in particular the dimensions of the pixels or the distance of repeating features to each other. This information can facilitate identifying artifacts as diffraction patterns can be calculated and compared to the image. Correcting image artifacts may comprise identifying reflection features, sorting them by brightness and selecting the locally brightest features. For determining a distance around a feature in the image which qualifies as local, the information of the transparent display may be used, in particular a distance in the image by which a light beam may be displaced by diffraction on the transparent display may be calculated based on the information about the transparent display. This method can be particularly useful for pattern images. Further details are disclosed in WO 2021/105265 A1.
Optical biometric recognition may comprise determining material data, for example based on the pattern image. Particularly by considering the material as a parameter for validating the authentication process, the authentication process may be robust against being outwitted by using a recorded image of the user.
Extracting the material data from the pattern image may be performed by beam profile analysis of the light spots. With respect to beam profile analysis reference is made to WO 2018/091649 A1, WO 2018/091638 A1 and WO 2018/091640 A1, the full content of which is included by reference. Beam profile analysis can allow for providing a reliable classification of scenes based on a few light spots. Each of the light spots of the pattern image may comprise a beam profile. The term "beam profile” may generally refer to at least one intensity distribution of the light spot on the optical sensor as a function of the pixel. The beam profile may be selected from the group consisting of a trapezoid beam profile; a triangle beam profile; a conical beam profile and a linear combination of Gaussian beam profiles. In an embodiment, extracting material data from the pattern image may comprise generating the material type and/or data derived from the material type. Preferably, extracting material data may be based on the pattern image. Material data may be extracted by using at least one material model. Extracting material data may include providing the pattern image to a material model and/or receiving material data from the material model. Providing the image to a material model may comprise and may be followed by receiving the pattern image at an input layer of the material model or via a material model loss function. The material model may be a data-driven model. Data-driven model may comprise a convolutional neural network and/or an encoder decoder structure such as an autoencoder. Other examples for generating a representation may be FFT, wavelets, deep learning, like CNNs, energy models, normalizing flows, GANs, vision transformers, or transformers used for natural language processing, Autoregressive Image Modeling, Normalizing Flows, Deep Autoencoders, Deep Energy-Based Models. Supervised or unsupervised schemes may be applicable to generate a representation, also embedding in e.g. cosine or Euclidian metric in ML language. The data- driven model may be parametrized according to a training data set including at least one image and material data, preferably at least one pattern image and material data. In another embodiment, extracting material data may include providing the image to a material model and/or receiving material data from the material model. In another embodiment, the data-driven model may be trained according to a training data set including at least one image and material data. In another embodiment, the data-driven model may be parametrized according to a training data set including at least one image and material data. The data-driven model may be parametrized according to a training data set to receive the image and provide material data based on the received image. The data-driven model may be trained according to a training data set to receive the image and provide material data as output based on the received image. The training data set may comprise at least one image and material data, preferably material data associated with the at least one image. The image may comprise a representation of the image. The representation may be a lower dimensional representation of the image. The representation may comprise at least a part of the data or the information associated with the image. The representation of an image may comprise a feature vector. In an embodiment, determining a representation, in particular a lower-dimensional representation may be based on principal component analysis (PCA) mapping or radial basis function (RBF) mapping. Determining a representation may also be referred to as generating a representation. Generating a representation based on PCA mapping may include clustering based on features in the pattern image and/or partial image. Additionally or alternatively, generating a representation may be based on neural network structures suitable for reducing dimensionality. Neural network structures suitable for reducing dimensionality may comprise encoder and/or decoder. In an example, neural network structure may be an autoencoder. In an example, neural network structure may comprise a convolutional neural network (CNN). The CNN may comprise at least one convolutional layer and/or at least one pooling layer. CNNs may reduce the dimensionality of a partial image and/or an image by applying a convolution, e.g. based on a convolutional layer, and/or by pooling. Applying a convolution may be suitable for selecting feature related to material information of the pattern image. In an embodiment, a material model may be suitable for determining an output based on an input. In particular, material model may be suitable for determining material data based on an image as input. A material model may be a deterministic model, a data-driven model or a hybrid model. The deterministic model, preferably, reflects physical phenomena in mathematical form, e.g., including first-principles models. A deterministic model may comprise a set of equations that describe an interaction between the material and the patterned electromagnetic radiation thereby resulting in a condition measure, a vital sign measure or the like. A data-driven model may be a classification model. A hybrid model may be a classification model comprising at least one machine-learning architecture with deterministic or statistical adaptations and model parameters. Statistical or deterministic adaptations may be introduced to improve the quality of the results since those provide a systematic relation between empiricism and theory. In an embodiment, the data-driven model may be a classification model. The classification model may comprise at least one machinelearning architecture and model parameters. For example, the machine-learning architecture may be or may comprise one or more of: linear regression, logistic regression, random forest, piecewise linear, nonlinear classifiers, support vector machines, naive Bayes classifications, nearest neighbors, neural networks, convolutional neural networks, generative adversarial networks, support vector machines, or gradient boosting algorithms or the like. In the case of a neural network, the material model can be a multi-scale neural network or a recurrent neural network (RNN) such as, but not limited to, a gated recurrent unit (GRU) recurrent neural network or a long short-term memory (LSTM) recurrent neural network. The data-driven model may be parametrized according to a training data set. The data-driven model may be trained based on the training data set. Training the material model may include parametrizing the material model. The term training may also be denoted as learning. The term specifically may refer to a process of building the classification model, in particular determining and/or updating parameters of the classification model. Updating parameters of the classification model may also be referred to as retraining. Retraining may be included when referring to training herein. In an embodiment, the training data set may include at least one image and material information.
In an embodiment, extracting material data from the image with a data-driven model may comprise providing the image to a data-driven model. Additionally or alternatively, extracting material data from the image with a data-driven model may comprise may comprise generating an embedding associated with the image based on the data-driven model. An embedding may refer to a lower dimensional representation associated with the image such as a feature vector. Feature vector may be suitable for suppressing the background while maintaining the material signature indicating the material data. In this context, background may refer to information independent of the material signature and/or the material data. Further, background may refer to information related to biometric features such as facial features. Material data may be determined with the data-driven model based on the embedding associated with the image. Additionally or alternatively, extracting material data from the image by providing the image to a data-driven model may comprise transforming the image into material data, in particular a material feature vector indicating the material data. Hence, material data may comprise further the material feature vector and/or material feature vector may be used for determining material data. In an embodiment, authentication process may be validated based on the extracted material data.
In an embodiment, the validating based on the extracted material data may comprise determining if the extracted material data corresponds a desired material data. Determining if extracted material data matches the desired material data may be referred to as validating. Allowing or declining the user and/or object to perform at least one operation on the device that requires authentication based on the material data may comprise validating the authentication or authentication process. Validating may be based on material data and/or image. Determining if the extracted material data corresponds a desired material data may comprise determining a similarity of the extracted material data and the desired material data. Determining a similarity of the extracted material data and the desired material data may comprise comparing the extracted material data with the desired material data. Desired material data may refer to predetermined material data. In an example, desired material data may be skin. It may be determined if material data may correspond to the desired material data. In the example, material data may be non-skin material or silicon. Determining if material data corresponds to a desired material data may comprise comparing material data with desired material data. A comparison of material data with desired material data may result in a allowing and/or declining the user and/or object to perform at least one operation that requires authentication. In the example, skin as desired material data may be compared with non-skin material or silicon as material data and the result may be declination since silicon or non-skin material may be different from skin.
In an embodiment, the authentication process or its validation may include generating at least one feature vector from the material data and matching the material feature vector with associate reference template vector for material.
The authentication unit may be configured for authenticating the user in case the user can be identified and/or if the material data matches the desired material data. The device may comprise at least one authorization unit configured for allowing the user to perform at least one operation on the device, e.g. unlocking the device, in case of successful authentication of the user or declining the user to perform at least one operation on the device in case of non-suc- cessful authentication. Thereby, the user may become aware of the result of the authentication.
Biometric recognition typically involves at least one data-driven model which receives recorded biometric data, for example an image, as input and output recognition information, for example a classifier indicating if the biometric recognition subject was recognized or not or a classifier indicating if the image shows a real human or a spoofing mask. A data-driven model requires training datasets allowing adjustment of its parameters such that the data-driven model yields output of sufficient accuracy. The training dataset may comprise an image. The image may be a flood image or a pattern image as described above. The image may comprise the whole biometric recognition subject or parts of the biometric recognition subject, for example the face, the eye or the hand. In case of a pattern image, the image may comprise one or several pattern features, for example a central pattern feature and its direct neighbors. The training dataset may further comprise an indicator indicating if the image shows a real user, i.e. a human being, or a spoofing object, for example a silicone mask. The training datasets may originate from real measurements or they may be generated by the method of the present invention. It is also possible that a training dataset comprises both real measurement datasets and datasets generated by the method of the present invention.
The method of the present invention comprises receiving context information or context data. The term "receiving” may refer to reading the context information from a file, a database or from an input, for example through a user interface. The term "context information” may mean all information associated with biometric capture device, the biometric capture environment or the biometric recognition subject. Generally, the context information may represent physical properties of the biometric capture device, the biometric capture environment or the biometric recognition subject. Context information associated with the biometric capture device may comprise information about hardware components, in particular those hardware components which may influence the process of biometric recognition. In particular, it may include information about hardware components which are employed to retrieve data from a user which serve to authenticate the user. Information about hardware components may comprise hardware specifications such as specifications associate with the optics of the biometric capture device, in particular specifications associated with the diffractive or refractive properties of the biometric capture device, i.e. with properties of the biometric device which are related to how light is diffracted or refracted in the biometric device. For example, if an image of the user is captured the biometric capture device may include a light projector, a camera or any component placed in the light ray path, for example optical components like an aperture, a lens, a shutter or a display in case the camera is placed behind the display. In this case context information may be associated with diffractive or refractive properties of the camera, the lens, the shutter or the display.
Context information associated with the biometric capture device may hence comprise information about the type of light projector, for example a laser array, such as a VSCEL array, a laser with a diffractive optical element (DOE), or an LED array; the light projector settings, for example its illumination power, the wavelength of emitted light, the coherence lengths of the emitted light or the divergence of the emitted light; or the characteristics of the illumination, for example information related to a projected pattern like shape, size or density of the pattern features.
Context information associated with the biometric capture device may further comprise information about the type of the camera, for example its producer or its name, its characteristics such as resolution, radiant sensitive area, spectral range of sensitivity, spectral sensitivity at certain wavelengths, half angle, dark current, open circuit voltage, short-circuit current, rise time, fall time, forward voltage, capacitance, temperature coefficient, noise equivalent power, detection limit or thermal resistance junction ambient real.
Context information associated with the biometric capture device may further comprise information about the type of display, for example its producer or its name, its characteristics such as pixel size, pitch size of pixels, transmission rate at certain wavelengths, diffraction loss, power threshold the display can withstand. Context information associated with the biometric capture environment may comprise all surrounding factors which may influence the process of capturing sensor data required for biometric recognition. Context information associated with the biometric capture environment may comprise information about the distance of a biometric recognition subject to be authenticated to the recording hardware, the angle under which a biometric recognition subject to be authenticated is recorded, the surrounding temperature, the light conditions, for example the intensity of the background light.
Context information associated with the biometric recognition subject may comprise information about the subject to be recognized. The biometric recognition subject may be a user, for example a human, to be recognized, or it may be thing, for example a spoofing mask. Context information associated with the biometric recognition subject may comprise material information about the subject, for example skin, wood, paper or silicon; surface information such as skin tone, type and/or degree of make-up, albedo, reflectance; or personal information, for example the age, size, gender, or ethnic origin of a person.
Context information may comprise categorical, such as ordinal or nominal, or numeric variables. Generally, numeric variables, in particular metric variables reflecting physical properties for example light intensity or focal length of a lens, are preferred. Such variables allow interpolation for values for which no training data is available.
The method of the present invention comprises generating a dataset by providing the context information to a generative model. The generative model is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information. The generative model may be parametrized according to the context information. Alternatively, the generative model may comprise multiple statistically dependent conditional submodels or functions, where each submodel or function is specifically conditioned for particular context information. Combinations of these approaches are conceivable. The generative model may be a data-driven model. The generative model may be an artificial neural network. The generative model may be a probabilistic model. The generative model may be a flow-based generative model, for example normalizing flows, a variational autoencoder (VAE) or a generative adversarial network (GAN). It is also possible to use variations or combinations of such model, for example SurVAE as described by Didrik Nielsen on 34th Conference on Neural Information Processing Systems (NeurlPS 2020) - arXiv:2007.02731v2.
The generative model may be a trained generative model, i.e. a generative model which is trained with training data. The training data may comprise training datasets comprising biometric capture data, for example an image or a sound file. The training datasets may further comprise annotations representing context information. The context information may serve as condition for the biometric data such that the model is able to identify correlations between the context information and features of the biometric data. Hence, the generative model may be a conditional model, wherein the context information is used as condition for the model. The training may involve adjusting model parameters with the goal to minimize a loss function. For example, for probabilistic models the Kullback— Leibler divergence between the model's distribution and the empirical distribution may be minimized to maximize the model's likelihood. In case of GANs, the loss function may be regulated minimax loss which accounts for the convergence behavior. Convergence may be improved by starting the training with low resolution images and gradually increase resolution or by the two time-scale update rule.
From time to time, additional training data may be become available, for example if new images are taken with a new camera. The additional training data may be associated with context information not yet comprised in the training datasets the generative model was previously trained with. The additional training data may be added to the previous training data by encoding the source of the data, for example a dataset ID, within the context. In this way the generative model may be regarded as a conditional model with respect to the underlying training datasets. Generally, this method is advantageous for conditional models. The conditioning of models on different training datasets may also inject an implicit transfer-function which allows to transfer biometric capture data from a source system to a target system. A biometric capture from the training dataset of the source system may be transferred to the target system in three steps. In the first step a context-free representation of the biometric capture may be computed by using the context information of the source system. In the second the context information may be modified to represent the target-system and in the last step the modified context information may be used to compute a biometric capture in the target system. This approach proves to be particularly effective if the corresponding training datasets can be sufficiently controlled for non-explanatory variable, for example by capturing the same subjects in the same environment.
In many cases, however, it is very difficult to stabilize the conditions of the biometric captures, for example due to non-availability or aging of the biometric subjects, non-availability of certain make-ups or the controllability of certain parameters like skin properties. In these cases it may be more reasonable not to try to transfer individual biometric captures from one system to another, but to transfer the model in function space and then generate random biometric captures of the target system with given context.
Alternatively, the trained generative model may be fine-tuned only with the additional training data. Fine-tuning may comprise limiting changes of parameters of the generative model to mitigate or avoid catastrophic interference or catastrophic forgetting.
The trained generative model may receive context information and output multiple different datasets for training a biometric recognition system. Variations of the output may arise from randomizing an internal state of the model. For example, the model may comprise a random vector containing values from the training procedure. The random vector may be of higher dimensionality than the biometric capture used as input during training of the model. The random vector may be varied by adding or multiplying small random values. In particular for probabilistic models, the variation may correspond to the probability distribution of corresponding features in the training data. Hence, the generative model may output multiple different datasets corresponding to a probability distribution in the training data the generative model was trained with.
The method of the present invention comprises outputting the dataset obtained from the model. Outputting can mean writing the dataset on a non-transitory data storage medium, for example into a file or database, display it on a user interface, for example a screen, or both. It is also possible to output the dataset through an interface to a cloud system for storage and/or further processing.
The present invention further relates to a non-transient computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to perform the method according to the present invention. The term "computer-readable data medium" may refer to any suitable data storage device or computer readable memory on which is stored one or more sets of instructions (for example software) embodying any one or more of the methodologies or functions described herein. The instructions may also reside, completely or at least partially, within the main memory and/or within the processor during execution thereof by the computer, main memory, and processing device, which may constitute computer-readable storage media. The instructions may further be transmitted or received over a network via a network interface device. Computer-readable data medium include hard drives, for example on a server, USB storage device, CD, DVD or Blue-ray discs. The computer program may comprise all functionalities and data required for execution of the method according to the present invention or it may provide interfaces to have parts of the method processed on remote systems, for example on a cloud system.
The invention further relates to a method for granting a user access to a device or application. A device can be a mobile device, for example smartphone, a tablet computer, a laptop computer or a smartwatch, or it can be a stationary device such as a payment terminal or an access control system, for example to control access to a building, a subway train station, an airport gate, a production facility, a car rental site, an amusement park, a cinema, or a supermarket for registered customers. The access control system may further be integrated into a vehicle, for example a car, a train, an airplane, or a ship. An application may refer to a local program, for example installed on a smartphone or a laptop, or a remote service, for example a service on a cloud system to be accessed via internet. The application may serve several purposes, for example to authorize a payment, identify the user for a transaction with the public administration, for example to renew a driver's license, or authorize the user for high-security communication.
Brief Description of the Figures
Figure 1 illustrates an embodiment of the present invention.
Figure 2 illustrates an example of a generative model.
Figure 3 illustrates another example of a generative model.
Figure 4 illustrates an example for a biometric recognition system.
Figure 5 illustrates an exemplary use of the invention. Figure 6 illustrates an example for a biometric recognition.
Description of Embodiments
Figure 1 illustrates an embodiment of the present invention. A biometric recognition system 110 may receive biometric sensor data 112. Such biometric sensor data 112 may be a pattern image recorded from a user under illumination of patterned infrared light. Alternatively or additionally, biometric sensor data 112 may be an audio record of the voice of a user. The biometric recognition system 110 may analyze biometric sensor data 112 including for example the comparison to reference data 111. Based on the result, the biometric recognition system 110 may generate an authentication output 120. The authentication output 120 may be a signal indicating if an authorized user has been detected. The biometric recognition system 110 may comprise a data-driven model. Such model may be or comprise a material model which receives images showing a user or parts of a user under illumination of a structured light pattern and classifies the images into those showing real skin and those showing other materials, for example a silicon spoofing mask. The data driven model of the biometric recognition system 110 may be a trained model, which is trained with training data. The training data may comprise original training data 101, i.e. data originating from biometric sensors and potentially being labelled manually, and generated training data 104.
Generated training data may be training data which has been generated by a generative model 102. The generative model 102 in turn may be a data-driven model, for example a flow-based generative model. The generative model 102 may be trained with the original training data 101 as well as context information 103. Context information 103 may be related to the biometric capture device of the biometric recognition system 110, for example the lens characteristics of the camera or refractive properties of a transparent display.
Figure 2 illustrates an example of a generative model 210. A biometric capture, for example an image 201, may be recorded by a biometric capture device, for example a camera or a touch-sensitive screen. The image 201 or parts thereof may be used as input to a function f of generative model 210. Function f is configured to transform the image 201 into a random vector 203. Alternatively, the image 201 may be first transformed into a feature vector which may be used as input. The random vector 203 may provide a context-free representation of the image 201 in terms of independent identical distributed variables. For this purpose the random vector 203 may be designed to have a higher dimensionality as the image 201. In addition function f may be parametrized according to the context vector. The context vector 202 may comprise elements related to the hardware of the biometric recognition system and/or the biometric capture environment. Function f may be parametrized such that random vector 203 depends both on image 201 and context vector 202. Function f may be trained involving approximating the probability distribution p(x|y) by the probability distribution q(x|y) =pZ (f(x;y))'| f(x;y)/ x| and minimizing the Kullback-Leibler divergence DKL (p,q) — > min. The variable x refers to the elements of the feature vector 202. The variable y refers to the elements of the context vector 204. Once the function f is trained, its inverse function T1 may be calculated. The generative model 210 may generate various feature vectors 202 and images 201 by varying the random vector 203 according to its probability distribution and choosing the context vector as appropriate, i.e. according to the biometric recognition system to be used. Such generated feature vectors or images may be used to train the model of a biometric recognition system, either alone or in combination with real biometric sensor data.
Figure 3 illustrates another example of a generative model 310. The model is similar to the generative model described for figure 2. The image 301 may show a face of a user under illumination with structured light, for example a hexagonal point pattern. The image 301 may be cropped into multiple patches 302, wherein a patch may show a subset of the patterns in image 301 . For example, one patch may have a spot in the center and the neighboring spots around it. Each patch 302 may be used to train a function f of the generative model. The training involves a context vector 304 comprising information related to the hardware and the environment under which image 301 was recorded. The generative model 310 outputs a random vector 308. After training as described above, the inverse function T1 may be calculated which may be used to generate patches 302 depending on the choice of a context vector 304.
Figure 4 illustrates an example for a biometric recognition system 400. The biometric recognition system 400 may be integrated into a portable device, for example a smartphone, a tablet computer, a laptop computer or a smartwatch. It may comprise a projector 401 which projects light 411 onto a user 410. The light may be infrared light, for example with a wavelength of 850 nm, which is invisible to the user 410. The projected light 411 may be floodlight or patterned light, for example a hexagonal point pattern. The projected light 411 may impinge on the face of the user 410, but it may also impinge on the whole head including hair, the upper part of the body including head neck and shoulders or even the complete body. The reflected light 412 may be recorded by a camera 402 which thereby captures an image of the user 410 illuminated by the projected light. The camera 402 may generate an image in the optical range matching the wavelength emitted by projector 401, for example in the infrared range. The image may be a grayscale image, i.e. each pixel comprises only the total intensity information, or an RGB image, i.e. different pixels indicate the intensity in a particular wavelength. The image may be passed to processor 403. The processor 403 may be a microcontroller, i.e. comprising memory and IO controller functionalities, or it may be a CPU which is connected to memory and IO controllers. The processor 403 may execute program code which determines if the user 410 is an authorized user 410. Such determination may involve vectorizing the image into features. Such feature vector may be compared to a stored template. If the difference between the feature vector and the stored template is below a predefined threshold, the processor may determine that the user 410 in the vehicle is authorized. The processor 403 may further determine if the image really shows a human rather than a spoofing mask. This may be accomplished by classifying the material of the face in the image by evaluating reflection characteristics in the reflected light. If no skin is detected, the processor may determine that the user 410 in front of the transparent display is not authorized. The processor 403 may be communicatively coupled to memory 404. The memory 404 may be transient memory, for example random access memory (RAM), or persistent memory, for example flash memory. The memory 404 may comprise program code configured to determines if the user 410 is an authorized person as well as templates for registered authorized users. The processor 403 may generate a signal indicating that the user 410 is authorized. The signal may be forwarded to an access control 405 for unlocking the device, granting access to an application, or effecting a secure payment, for example via a wireless communication interface.
The biometric recognition system 400 may comprise a display 406. The display 406 may be transparent for the projected light 411 and the reflected light 412, such that the projector 401 and the camera 402 may be placed behind the display 406. The display 406 may only transparent at the positions at which the projected light 411 and the reflected light 412 passes the display 406. Transparent may mean that at least 30 % or at least 50 % of the incident light passes through the transparent display 406.
Figure 5 illustrates an exemplary use of the invention. A biometric recognition system 500a, which may be a smartphone, may have a camera and/or a projector behind a display 501a. The biometric recognition system 400a may comprise a biometric recognition model to authenticate a user which is trained with images obtained through the display 501a. At some point the display 501a may be replaced by a different display 501b in an exchange step 502. This may be part of a version update of the biometric recognition system 500a to a biometric recognition system 500b. In order to adjust the biometric recognition model, a generative model 520 may be provided with context information associated with the display 501b comprising, for example, pixel size, pixel density, pitch size and transparence of display 501b. The generative model 520 may generate training data 530, for example image patches of a face, which is adjusted to the display 501b without the need to newly record images of test persons with the display 501b. The generated training data 530 may be used to train (503) the system for biometric recognition, whereupon the biometric recognition system 500b is ready for use within a short period of time.
Figure 6 illustrates an example for a biometric recognition. The biometric authentication system may be a face authentication system which verifies if a face in front of a camera is really the claimed person, so neither a different person nor a spoofing mask. The biometric authentication system may be integrated into a portable device such as a smartphone, or in an access system, for example a door opening system of a building or a vehicle.
The face of the person may be illuminated with patterned illumination 601 . A camera may record a pattern image 602 of the face under patterned illumination. The pattern image 602 may be used to authenticate the material 603 of the face as described for figure 3 and 4. The material authentication may yield a similarity score. If the similarity score is below a threshold, the authentication may be rejected 630. The face may be illuminated with flood light 611, for example shortly before or after illuminating the face with patterned light 601 . A camera may record a flood image 612 of the face under flood illumination 611. The flood image 612 may be used to recognize the identity of the person, for example by extracting features of the flood image and compare them with a reference database. If recognition yields an incorrect person, for example a person without access rights, the authentication may rejected 630. If both the correct person is identified and the material authentication yields a similarity score above the threshold, the person may be authenticated 620.
Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims.
For the processes and methods disclosed herein, the operations performed in the processes and methods may be implemented in differing order. Furthermore, the outlined operations are only provided as examples, and some of the operations may be optional, combined into fewer steps and operations, supplemented with further operations, or expanded into additional operations without detracting from the essence of the disclosed embodiments.
In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single unit or device may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Providing in the scope of this disclosure may include any interface configured to provide data. This may include an application programming interface, a human-machine interface such as a display and/or a software module interface. Providing may include communication of data or sub-mission of data to the interface, in particular display to a user or use of the data by the receiving node, entity or interface.
Various units, circuits, entities, nodes or other computing components may be described as "configured to” perform a task or tasks. Configured to shall recite structure meaning "having circuitry that” performs the task or tasks on operation. The units, circuits, entities, nodes or other computing components can be configured to perform the task even when the unit/circuit/component is not operating. The units, circuits, entities, nodes or other computing components that form the structure corresponding to "configured to” may include hardware circuits and/or memory storing program instructions executable to implement the operation. The units, circuits, entities, nodes or other computing components may be described as performing a task or tasks, for convenience in the description. Such descriptions shall be interpreted as including the phrase "configured to.” Any recitation of "configured to” is expressly intended not to invoke 35 U.S.C. § 112(f) interpretation.
In general, the methods, apparatuses, systems, computer elements, nodes or other computing components described herein may include memory, software components and hardware components. The memory can include volatile memory such as static or dynamic random-access memory and/or nonvolatile memory such as optical or magnetic disk storage, flash memory, programmable read-only memories, etc. The hardware components may include any combination of combinatorial logic circuitry, clocked storage devices such as flops, registers, latches, etc., finite state machines, memory such as static random-access memory or embedded dynamic random-access memory, custom designed circuitry, programmable logic arrays, etc.
Any disclosure and embodiments described herein relate to the methods, the systems, apparatuses, devices, chemi- cals, materials, computer program elements lined out above and vice versa. Advantageously, the benefits provided by any of the embodiments and examples equally apply to all other embodiments and examples and vice versa. All terms and definitions used herein are understood broadly and have their general meaning.

Claims

Claims
1 . A computer-implemented method for generating datasets for training a biometric recognition system comprising a. receiving context information associated with hardware components of the biometric capture device, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
2. The method according to claim 1, wherein the biometric recognition system is an optical face recognition system and wherein the context information comprises information about a light projector and a camera.
3. The method according to claim 2, wherein the generated dataset comprises an image and an indicator indicating if the image shows a real user.
4. The method according to claim 3, wherein the image is a partial image showing a part of the face under illumination with patterned light.
5. The method according to any of the claims 2 to 4, wherein the generated dataset comprises an image of the face under illumination with patterned light, wherein the light is coherent infrared light and wherein the pattern comprises less than 4000 light spots.
6. The method according to any of the claim 1 to 5, wherein the biometric recognition system comprises a transparent display and wherein the context information comprises information about the transparent display.
7. The method according to any of the claims 1 to 6, wherein the generative model is a probabilistic model.
8. The method according to any of the claims 1 to 7, wherein the generative model is a flow-based generative model.
9. Use of the datasets obtained by the method according to any of the claims 1 to 8 for training a data-driven model of a biometric recognition system.
10. A biometric recognition system comprising a data-driven model trained with datasets obtained by the method according to any of the claims 1 to 8.
11 . The biometric recognition system according to claim 10, wherein the biometric recognition system is a face recognition system comprising a projector to illuminate the face with patterned light and a camera to record a pattern image of the face under illumination with patterned light.
12. The biometric recognition system according to claim 11, wherein light with a wavelength of 760 nm to 1.5 pm is projected.
13. The biometric recognition system according to claim 12, wherein the biometric recognition system is configured to extracting material data from the pattern image.
14. A system for generating datasets for training a biometric recognition system comprising a. an input for receiving context information associated with hardware components of the biometric capture device, b. a processor for generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. an output for outputting the dataset.
15. A non-transient computer-readable medium including instructions that, when executed by one or more processors, cause the one or more processors to perform a method comprising a. receiving context information associated with hardware components of the biometric capture device, b. generating a dataset for training a biometric recognition system by providing the context information to a generative model which is a data-driven model and which is configured to receive context information as input and to output a dataset for training a biometric recognition system based on the context information, and c. outputting the dataset.
PCT/EP2024/076805 2023-09-29 2024-09-24 Biometric recognition dataset generation Pending WO2025068196A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP23200763 2023-09-29
EP23200763.3 2023-09-29

Publications (1)

Publication Number Publication Date
WO2025068196A1 true WO2025068196A1 (en) 2025-04-03

Family

ID=88238093

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2024/076805 Pending WO2025068196A1 (en) 2023-09-29 2024-09-24 Biometric recognition dataset generation

Country Status (1)

Country Link
WO (1) WO2025068196A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017222618A1 (en) 2016-06-23 2017-12-28 Apple Inc. Top-emission vcsel-array with integrated diffuser
WO2018091638A1 (en) 2016-11-17 2018-05-24 Trinamix Gmbh Detector for optically detecting at least one object
US20200104570A1 (en) 2018-09-28 2020-04-02 Apple Inc. Network performance by including attributes
WO2021105265A1 (en) 2019-11-27 2021-06-03 Trinamix Gmbh Depth measurement through display

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017222618A1 (en) 2016-06-23 2017-12-28 Apple Inc. Top-emission vcsel-array with integrated diffuser
WO2018091638A1 (en) 2016-11-17 2018-05-24 Trinamix Gmbh Detector for optically detecting at least one object
WO2018091640A2 (en) 2016-11-17 2018-05-24 Trinamix Gmbh Detector for optically detecting at least one object
WO2018091649A1 (en) 2016-11-17 2018-05-24 Trinamix Gmbh Detector for optically detecting at least one object
US20200104570A1 (en) 2018-09-28 2020-04-02 Apple Inc. Network performance by including attributes
WO2021105265A1 (en) 2019-11-27 2021-06-03 Trinamix Gmbh Depth measurement through display

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
AXON SAMUEL: "iPhone X review: Early adopting the future", ARS TECHNICA, 3 November 2017 (2017-11-03), Internet, XP093131116, Retrieved from the Internet <URL:https://arstechnica.com/gadgets/2017/11/iphone-x-review-early-adopting-the-future/> [retrieved on 20240214] *
C. SZEGEDY ET AL.: "Going deeper with convolutions'', CoRR, abs/1409.4842, 2014. For more details with respect to convolutional neural network for the face recognition system reference is made to Florian Schroff, Dmitry Kalenichenko, James Philbin, ''FaceNet: A Unified Embedding for Face Recognition and Clustering", ARXIV:1503.03832
G. B. HUANGM. RAMESHT. BERGE. LEARNED-MILLER: "Technical Report 07-49", October 2007, UNIVERSITY OF MASSACHUSETTS, article "Labeled faces in the wild: A database for studying face recognition in unconstrained environments"
JI SHULIN ET AL: "One-way multimodal image-to-image translation for heterogeneous face recognition", JOURNAL OF ELECTRONIC IMAGING, S P I E - INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING, US, vol. 32, no. 3, 1 May 2023 (2023-05-01), pages 33029, XP060183369, ISSN: 1017-9909, [retrieved on 20230614], DOI: 10.1117/1.JEI.32.3.033029 *
JI YI ET AL: "Purifying Adversarial Images Using Adversarial Autoencoder With Conditional Normalizing Flows", vol. 4, 1 January 2023 (2023-01-01) - 16 February 2024 (2024-02-16), pages 267 - 274, XP093131380, ISSN: 2644-1322, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/ielx7/8782710/10040759/10123077.pdf?tp=&arnumber=10123077&isnumber=10040759&ref=aHR0cHM6Ly9zY2hvbGFyLmdvb2dsZS5jb20v> [retrieved on 20240216], DOI: 10.1109/OJSP.2023.3275053 *
L. WOLFT. HASSNERI. MAOZ: "Face recognition in unconstrained videos with matched background similarity", IEEE CONF. ON CVPR, 2011
SAFA C MEDIN ET AL: "MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 1 November 2021 (2021-11-01), XP091092390 *
SHAHREZA HATEF OTROSHI ET AL., IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 45, 2023, pages 14248 - 14265
SHAHREZA HATEF OTROSHI ET AL: "Comprehensive Vulnerability Evaluation of Face Recognition Systems to Template Inversion Attacks via 3D Face Reconstruction", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY, USA, vol. 45, no. 12, 4 September 2023 (2023-09-04), pages 14248 - 14265, XP011952728, ISSN: 0162-8828, [retrieved on 20230905], DOI: 10.1109/TPAMI.2023.3312123 *

Similar Documents

Publication Publication Date Title
TWI710961B (en) Method and system for identifying and/or authenticating an individual and computer program product and data processing device related thereto
Ahmed et al. [Retracted] Optimum Feature Selection with Particle Swarm Optimization to Face Recognition System Using Gabor Wavelet Transform and Deep Learning
US20240378906A1 (en) Apparatus and method for automatic license plate recognition of a vehicle
JP2025507401A (en) Facial recognition with material data extracted from images
WO2025045642A1 (en) Biometric recognition system
Okokpujie et al. Development of an adaptive trait-aging invariant face recognition system using convolutional neural networks
JP2025507403A (en) Face recognition including occlusion detection based on material data extracted from images
KR20240149897A (en) System and method for determining the material of a target object
Benalcazar et al. Toward an efficient iris recognition system on embedded devices
WO2025068196A1 (en) Biometric recognition dataset generation
WO2024231531A1 (en) Projector with oled
WO2025040591A1 (en) Skin roughness as security feature for face unlock
CN120112964A (en) Distance as a safety feature
JP2025508407A (en) Image manipulation for determining materials information.
EP4530666A1 (en) 2in1 projector with polarized vcsels and beam splitter
WO2024170597A1 (en) Behind oled authentication
EP4666254A1 (en) Behind oled authentication
WO2024200502A1 (en) Masking element
WO2024170254A1 (en) Authentication system for vehicles
EP4665614A1 (en) Authentication system for vehicles
WO2025046067A1 (en) Optical elements on flood vcsels for 2in1 projectors
CN121195290A (en) Projectors with OLED
WO2024170598A1 (en) Behind oled authentication
EP4666255A1 (en) Behind oled authentication
WO2025012337A1 (en) A method for authenticating a user of a device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24776932

Country of ref document: EP

Kind code of ref document: A1