[go: up one dir, main page]

WO2017131672A1 - Generating pose frontalized images of objects - Google Patents

Generating pose frontalized images of objects Download PDF

Info

Publication number
WO2017131672A1
WO2017131672A1 PCT/US2016/015181 US2016015181W WO2017131672A1 WO 2017131672 A1 WO2017131672 A1 WO 2017131672A1 US 2016015181 W US2016015181 W US 2016015181W WO 2017131672 A1 WO2017131672 A1 WO 2017131672A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frontalized
parameters
pose
differential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2016/015181
Other languages
French (fr)
Inventor
Florian RAUDIES
Aziza SATKHOZHINA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Hewlett Packard Enterprise Development LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development LP filed Critical Hewlett Packard Enterprise Development LP
Priority to PCT/US2016/015181 priority Critical patent/WO2017131672A1/en
Publication of WO2017131672A1 publication Critical patent/WO2017131672A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • FIG. 1 is a block diagram of an example deep convolutional network that includes a frontalization layer
  • FIG. 2 is a block diagram showing an example frontalization layer
  • FIG. 3 is a block diagram showing an example system that can frontalize objects detected in an image
  • FIG. 4 is a process flow diagram showing an example method of generating pose frontalized images of objects.
  • FIG. 5 is a block diagram showing an example non-transitory, tangible computer-readable medium that stores code for frontalization of faces in images.
  • Detection refers to detecting and locating a face in an image.
  • Identification refers to identifying a person in an image.
  • Verification refers to deciding whether two images depict the same person or not.
  • faces may appear in different poses in an image. For example, a face may point to the lower-right in one image and to the far-left in another image.
  • Frontalization can be used to reconstruct the frontal view of such a face.
  • some examples described herein provide a method for the frontalization of faces through 3D rotations and 3D translations that are learned by a deep convolutional network.
  • a frontalization can be used to simplify the tasks of face detection, face identification, and face verification.
  • face detection, identification, and verification is used in a variety of security tasks.
  • the techniques herein can be used for user authentication as it may be less likely that such a system would be fooled by holding a picture of the same person in front of a camera.
  • individuals cannot hide from security cameras by presenting only a side view of their face because the system can transform the side views and generate a frontal pose.
  • the present techniques for face frontalization can be used in a host of security tasks including: border control; a national identity system; access control to devices, homes, cars, secured areas; or face-based analysis of age, sex, ethnicity, or facial expression. Overall, such systems can be used to control access to devices on a small scale as well as access to buildings or countries on a larger scale.
  • the present techniques include the use of a separate frontalization layer for performing frontalization tasks in a deep convolutional network.
  • frontalization and identity tasks are learned through representations in separate layers of the deep convolutional network, rather than being learned by a representation spanning several layers.
  • the break-down into two subsequent tasks reduces the number of samples required for training because frontalization is largely independent of the individual face in the image, and improves the accuracy for the identity task because it can leverage frontal poses, which is an easier task than recognizing identity in non- frontal poses.
  • working with fewer training samples also reduces the training time for deep convolutional networks.
  • the present techniques do not use landmarks for purposes of detection.
  • the techniques described herein may use fewer training samples resulting in a reduced training time.
  • the techniques enable identification with higher accuracy because identification is performed using frontal poses.
  • Fig. 1 is a block diagram of an example deep convolutional network that includes a frontalization layer.
  • the example deep convolutional network is generally referred to by the reference number 100 and can be implemented using the example computing device 302 of Fig. 3 below.
  • the example deep convolutional network 100 includes a plurality of functional layers.
  • the layers may include an image detection layer 102, a frontalization layer 104, a convolution layer 106, a bias layer 108, a non- linearity layer 1 10, a pooling layer 1 12, a loss calculation layer 1 14, and a label layer 1 16.
  • each layer of the deep convolutional network 100 may have hundreds to thousands of parameters that can be initialized and then adjusted through training.
  • the example deep convolutional network 100 generally provides for mapping of a non-frontal object pose to a frontal object pose from left to right in Fig. 1 , and a back projection of errors for training to adjust parameters using transforms that define the frontalization layer 104 from right to left.
  • the image detection layer 102 can detect an image.
  • the image may include one or more objects with non-frontal poses such as faces.
  • the image detection layer 102 can send the image to the
  • frontalization layer 1 04 as indicated by an arrow 1 18 for frontalization of the faces. No arrow is shown from the frontalization layer 1 04 to the image detection layer 102, as no errors are back-projected to the input image.
  • the frontalization layer 104 receives a two-dimensional image from the image detection layer 102 as indicated by an arrow 1 1 8 and outputs a frontalized two-dimensional image as indicated by an arrow 120.
  • a face frontalization can be performed through a 3D rotation and 3D translation as described herein.
  • the 3D rotation can be parameterized through
  • a quaternion is a complex number of the form w + xi + yj + zk, where w, x, y, z are real numbers and i, j, k are imaginary units that satisfy certain conditions.
  • This quaternion representation introduces four 3D rotation parameters, which can be learned. Additional parameters may be used for 3D translation of the face. For example, three additional parameters can be used to represent translation in the x, y, and z axes of a coordinate frame.
  • seven parameters may thus be used for face frontalization. These seven parameters transform a face from any given pose into the frontal pose.
  • the frontalization layer can receive an image of a face in any pose as input and estimate the pose of this input face through the estimation of the seven parameters. These parameters describe a transform from the face in the image, which can appear in any pose, to a 3D face model that appears in a frontal pose.
  • bilinear interpolation from the 3D face model into the 2D image space can be used to rasterize an image of the face in frontal pose. This rasterized 2D image is the output from the frontalization layer 104 as indicated by an arrow 120.
  • the frontalization layer 104 can also receive projected output loss feedback as indicated by an arrow 122. The operation of an example frontalization layer is discussed at greater length with respect to the example frontalization layer of Fig. 2 below.
  • the convolutional layer 106 reuses the same parameters within a spatial neighborhood. Reuse, as used herein, is defined as a convolutional operation between the parameters and input values.
  • the parameters may be a set of variables that define the convolutional kernel.
  • the input values for the convolutional layer 106 can come from the frontalized image that is received from the frontalization layer 104.
  • the convolutional layer 106 can also receive values such as back-projected errors from the bias layer 108.
  • the bias layer 108 of the example deep convolutional network 1 00 computes an additive bias to each value.
  • the bias can be a free parameter adjusted through training.
  • the non-linearity layer 1 10 computes the hyperbolic tangents of all received values and has neither a spatial interaction nor any parameters.
  • the hyperbolic tangent provides a non-linearity important to provide learning and generalization properties of a deep convolutional network. Instead of using the hyperbolic tangent rectified linear units (setting negative values to zero) can be used as well. However; it is important to have a non-linearity.
  • the pooling layer 1 1 2 combines values within a spatial neighborhood to form a single output.
  • the pooling layer 1 12 can use max- pooling, which computes the maximum of four spatially neighboring locations in an image as an output. Such max pooling can reduce the width and height of an image by a factor of two in width and height.
  • the pooling layer 1 12 has no trainable parameters. However, the range of max-pooling and stride length are fixed parameters.
  • the loss calculation layer 1 14 computes an output loss to be projected back to the other layers.
  • the loss calculation layer 1 14 can compute a soft-max function between a ground-truth label received from the label layer 1 16 as indicated by an arrow 142 and a predicted label received from, e.g. the pooling layer 1 12 as indicated by an arrow 136, or, more generally, any other layer preceding the loss layer 1 12 as indicated by arrows 120, 124, 1 28, and 132, to generate the output loss.
  • a soft-max function refers to a generalization of the logistic function that "squashes" a K- dimensional vector z of arbitrary real values to a K-dimensional vector ⁇ (z) of real values in the range (0, 1 ) that add up to 1 .
  • a label refers to a class identifier (CID) for a given data point.
  • a class identifier refers to an identifier for a particular class of objects.
  • a predicted label refers to the output of the deep convolutional network 100 given a particular input.
  • a ground-truth label refers to a label supplied by a user.
  • This output loss can then be projected back through layers 104-1 12 as indicated by arrows 122, 126, 1 30, 134, and 138.
  • the frontalization layer 104 may receive the output loss and adjust its
  • one or more other layers may also receive the output loss and either process the output loss or pass the output loss onto the next layer in the chain.
  • the label layer 1 16 may receive ground-truth labels from one or more users. For example, to classify bananas and apples from images, images of bananas and apples may be received and bananas may be assigned a CID of 0 and apples may be assigned a CID of 1 . For example, a user can label images as bananas or apples.
  • a predicted label is the output of the deep convolutional network 100 given an image of a banana or apple as an input. For example, the output can be a CID of 0 or 1 .
  • the diagram of Fig. 1 is not intended to indicate that the example deep convolutional network 100 is to include all of the components shown in Fig. 1 . Rather, the example deep convolutional network 100 can include fewer or additional components not illustrated in Fig. 1 (e.g., additional layers, etc.) as indicated by dashed lines 140. Moreover, the diagram of Fig. 1 is not intended to indicate that the components of the example deep convolutional network 100 are to be arranged in any particular order.
  • the frontalization layer 104 is shown between the image detection layer 102 and the convolutional layer 106, but can be alternatively plugged into the deep convolutional network 100 between any two layers.
  • Fig. 2 is a block diagram showing an example frontalization layer.
  • the frontalization layer is generally referred to by the reference number 200 and can again be implemented using the example computing device 302 of Fig. 3 below.
  • the frontalization layer 200 includes grid generator component 202 and a rasterization component 204.
  • the grid generator component 202 can receive pose parameters ⁇ as indicated by an arrow 206 and a supplied 3D model M as indicated by an arrow 208 and generate a two-dimensional (2D) sample grid G.
  • the 3D model may be retrieved from a database and represent a mean face given a plurality of 3D point cloud reconstructions of thousands of faces.
  • the 2D sample grid G can define the mapping of the 2D non-frontal pose into a 2D frontal pose of the object in the image.
  • the 2D sample grid G can then sent to the rasterization component 204 as indicated by arrow 21 0.
  • the rasterization component 204 can receive the non-frontalized object view in an 2D input image I as indicated by an arrow 21 2 and rasterize a frontalized object view in the 2D image ⁇ as indicated by an arrow 214.
  • the above steps thus summarize the computations of the forward direction corresponding to left to right in Fig. 1 above.
  • the backward direction of Fig. 2 indicated by arrows pointing from right to left can start with the rasterization component 204 receiving a differential image dl' as indicated by arrow 216.
  • the differential image dl' may contain errors or differentials for the given frontalized object view and object
  • the differential image dl' can include a mismatch between the predicted label and ground-truth label.
  • differential refers to the partial derivative of the output loss with regards to input values.
  • this differential image dl' can then be linked back to the used sampling grid G resulting in the differential sampling grid dG.
  • the differential 2D sampling grid dG can be related back as indicated by an arrow 21 8 to differentials in the pose parameters ⁇ , indicated by the differentials d ⁇ 222.
  • the differential frontalized 2D image dl' 21 6 can likewise be related back to a differential 2D image dl as indicated by an arrow 220.
  • the pose parameters ⁇ may only exist in the grid generator G, which is indicated in Fig. 2 by the connection of the inward 206 and outward 222 going arrows for pose parameters with a dashed line 224.
  • the pose parameters are internally used in the frontalization layer of the example deep convolution layer of Fig. 1 above.
  • a backward pointing arrow for the dl of the frontalization is absent in Fig. 1 . This is because the input image has no free parameters that need to be adjusted.
  • the frontalization layer could alternatively be at other positions in the chain of layers of a deep convolutional network. In those cases, a backward pointing arrow may be used to indicate the relation back of parameters.
  • pose parameters can be learned through error minimization over training samples.
  • a given data set may include 10 individuals, with 25 poses for each individual.
  • the 25 poses may include 5 elevation angles each at 5 azimuth angles.
  • Each individual may also be present 1 00 times in each pose.
  • the frontalization layer can be initialized using the identity transform, assuming only frontal poses. Any non-frontal face that is presented to the network produces a large error for a given loss function on the output when compared to the frontal view.
  • the non-frontal face may have an azimuth angle of 10 degrees and an elevation angle 5 degrees.
  • This error can be can be calculated at another layer and back- propagated through the deep convolutional network to correct the seven parameters in the frontalization layer.
  • the error can be calculated at the loss layer of Fig. 1 above.
  • the parameters of the frontalization layer can be trained using any suitable training set.
  • the learning of individualized parameters for the transform can compensate for any
  • FIG. 3 is a block diagram of a system that can frontalize objects detected within an image.
  • the system is generally referred to by the reference number 300.
  • the system 300 may include a computing device 302, and one or more client computers 304, in communication over a network 306.
  • a computing device 302 may include a server, a personal computer, a tablet computer, and the like.
  • the computing device 302 may include one or more processors 308, which may be connected through a bus 31 0 to a display 312, a keyboard 314, one or more input devices 31 6, and an output device, such as a printer 31 8.
  • the input devices 316 may include devices such as a mouse or touch screen.
  • the processors 308 may include a single core, multiples cores, or a cluster of cores in a cloud computing architecture. In some examples, the processors 308 may include a graphics processing unit (GPU).
  • the computing device 302 may also be connected through the bus 310 to a network interface card (NIC) 320.
  • the NIC 320 may connect the computing device 302 to the network 306.
  • the network 306 may be a local area network (LAN), a wide area network (WAN), or another network configuration.
  • the network 306 may include routers, switches, modems, or any other kind of interface device used for interconnection.
  • the network 306 may connect to several client computers 304. Through the network 306, several client computers 304 may connect to the computing device 302. Further, the computing device 302 may access images across network 306.
  • the client computers 304 may be similarly structured as the computing device 302.
  • the computing device 302 may have other units operatively coupled to the processor 308 through the bus 310. These units may include non- transitory, tangible, machine-readable storage media, such as storage 322.
  • the storage 322 may include any combinations of hard drives, read-only memory (ROM), random access memory (RAM), RAM drives, flash drives, optical drives, cache memory, and the like.
  • the storage 322 may include a store 324, which can include any images captured or generated in accordance with an
  • the store 324 is shown to reside on computing device 302, a person of ordinary skill in the art would appreciate that the store 324 may reside on the computing device 302 or any of the client computers 304.
  • the storage 322 may include a plurality of modules 326.
  • the modules 326 may be a set of instructions stored on the storage device 322, as shown in Fig. 3.
  • the instructions when executed by the processor 308, may direct the computing device 302 to perform operations.
  • the instructions can be executed by a graphics processing unit (GPU).
  • the grid generator 328, rasterizer 330, and/or task performer 332 may be implemented as logic circuits or computer-readable instructions stored on an integrated circuit such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other type of processor.
  • the grid generator 328 can receive an image of an object and a three-dimensional (3D) model of the object.
  • the 3D model can represent a mean object based on 3D point cloud representations of a plurality of objects.
  • the objects may be faces.
  • the grid generator 328 can also estimate a pose of the object in the image based on estimation of a plurality of parameters.
  • the parameters can include four parameters describing a quaternion to represent a 3D rotation and three components of a vector representing a 3D translation.
  • the plurality of parameters describe a 3D rotation and 3D translation.
  • the plurality of parameters can be learned via error minimization over a plurality of training samples.
  • the grid generator 328 can generate a two-dimensional sample grid based on the estimated pose parameters.
  • the rasterizer 330 can generate a frontalized image of the object based on the estimated pose and the 3D model of the object.
  • the task performer 332 can detect an object in another image based on a comparison with the frontalized image. In some examples, the task performer 332 can identify a person in another image based on the frontalized image. In some examples, the task performer 332 can verify that a person appears in another image based on the frontalized image. In some examples, the task performer can detect a frontalized face within an image.
  • the client computers 304 may include storage similar to storage 322. For example, the storage may be the non-transitory, tangible computer-readable medium of Fig. 5 below.
  • Fig. 4 is a process flow diagram showing a method of generating frontalized images of objects.
  • the example method is generally referred to by the reference number 400 and can be implemented using the processor 308 of the example system 300 of Fig. 3 above.
  • the processor receives an image of an object and a three-dimensional (3D) model of the object.
  • the object can be the face of a person.
  • the 3D model can be a face model.
  • the face model may represent a mean face based on 3D point cloud representations of a plurality of faces.
  • the processor estimates a pose of the object in the image based on estimation of a plurality of parameters.
  • the plurality of parameters describe a 3D rotation and 3D translation and are to be learned via error minimization over a plurality of training samples.
  • the processor can receive a differential frontalized image and project back a differential sample grid to generate a plurality of differential pose parameters to be used to generate a set of new pose parameters.
  • the processor can receive a differential frontalized image and projecting back a differential image.
  • the differential frontalized image may be received from a layer of a deep convolutional network and the differential image sent to another layer of the deep convolutional network.
  • the processor generates a frontalized image of the object based on the estimated pose and the 3D model of the object. For example, the processor can rasterize the frontalized image through a bilinear interpolation of the model of the object into a two-dimensional image space.
  • the processor detects, identifies, or verifies an object based on the frontalized image. For example, the processor can detect a face in an image based on the frontalized image. In some examples, the processor can identify a person in an image based on the frontalized image. In some examples, the processor can verify that a person appears in an image based on the frontalized image.
  • Fig. 5 is a block diagram showing a non-transitory, tangible computer- readable medium that stores code for frontalization.
  • the non-transitory, tangible computer-readable medium is generally referred to by the reference number 500.
  • the non-transitory, tangible computer-readable medium 500 may correspond to any storage device that stores computer-implemented
  • the non- transitory, tangible computer-readable medium 500 may include one or more of a non-volatile memory, a volatile memory, and/or one or more storage devices.
  • non-volatile memory examples include, but are not limited to, electrically erasable programmable read only memory (EEPROM) and read only memory (ROM).
  • volatile memory examples include, but are not limited to, static random access memory (SRAM), and dynamic random access memory (DRAM).
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • storage devices include, but are not limited to, hard disks, compact disc drives, digital versatile disc drives, and flash memory devices.
  • a processor 502 generally retrieves and executes the computer- implemented instructions stored in the non-transitory, tangible computer- readable medium 500 for frontalization of faces in images.
  • a grid generator module 504 can receive an image of a face and a three-dimensional (3D) face model.
  • the module 504 can estimate a pose of the face in the image based on estimation of a plurality of parameters.
  • the plurality of parameters can be learned via error minimization over a plurality of training samples.
  • the plurality of parameters may describe a 3D rotation and 3D translation.
  • the 3D rotation parameters may be four quaternions.
  • a rasterizer module 506 can generate a frontalized image of the face based on the estimated pose and the 3D face model.
  • the rasterizer module 506 can receive a differential frontalized image and project back a differential sample grid to generate a plurality of differential pose parameters to be used to generate a set of new pose parameters.
  • the new pose parameters can be used to generate an updated frontalized image.
  • the rasterizer module 506 can rasterize the frontalized image via a bilinear interpolation of the 3D face model into a two- dimensional image space.
  • the rasterizer module 508 can receive a differential frontalized image and project back a differential image.
  • a task module 508 can detect a face in another image based on the frontalized image. For example, the face can be detected in a particular portion of the other image. In some examples, the task module 508 can also identify a person in another image based on the frontalized image. For example, given a particular person's face stored in a database of frontalized images, the same person's face can be identified in additional images based on the frontalized image. In some examples, the task module 508 can verify that a person appears in another image based on the frontalized image. For example, the task module 508 can be used to compare faces having different poses in two images. [0047] Although shown as contiguous blocks, the software components can be stored in any order or configuration. For example, if the computer-readable medium 500 is a hard drive, the software components can be stored in noncontiguous, or even overlapping, sectors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

An example embodiment of the present techniques receives an image of an object and a three-dimensional (3D) model of the object. A pose of the object in the image can be estimated based on estimation of a plurality of parameters. The plurality of parameters may describe a 3D rotation and 3D translation and can be learned via error minimization over a plurality of training samples. A frontalized image of the object can be generated based on the estimated pose and the received 3D model of the object.

Description

GENERATING POSE FRONTALIZED IMAGES OF OBJECTS
BACKGROUND
[0001] Many situations exist in which object depictions in images are detected, identified, or verified. For example, faces of people in images can be identified or verified using various techniques.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Certain example embodiments are described in the following detailed description and in reference to the drawings, in which:
[0003] Fig. 1 is a block diagram of an example deep convolutional network that includes a frontalization layer;
[0004] Fig. 2 is a block diagram showing an example frontalization layer;
[0005] Fig. 3 is a block diagram showing an example system that can frontalize objects detected in an image;
[0006] Fig. 4 is a process flow diagram showing an example method of generating pose frontalized images of objects; and
[0007] Fig. 5 is a block diagram showing an example non-transitory, tangible computer-readable medium that stores code for frontalization of faces in images.
DETAILED DESCRIPTION
[0008] As described above, many situations exist in which features of objects in images can be detected, identified, or verified. For example, faces are objects of special importance and face frontalization is an important step for the automated detection, identification, and verification of faces. Detection, as used herein, refers to detecting and locating a face in an image. Identification, as used herein, refers to identifying a person in an image. Verification, as used herein, refers to deciding whether two images depict the same person or not. For any of the three tasks, faces may appear in different poses in an image. For example, a face may point to the lower-right in one image and to the far-left in another image. Frontalization can be used to reconstruct the frontal view of such a face.
[0009] Accordingly, some examples described herein provide a method for the frontalization of faces through 3D rotations and 3D translations that are learned by a deep convolutional network. Such a frontalization can be used to simplify the tasks of face detection, face identification, and face verification. Moreover, face detection, identification, and verification is used in a variety of security tasks. Thus, the techniques herein can be used for user authentication as it may be less likely that such a system would be fooled by holding a picture of the same person in front of a camera. Conversely, using the present techniques, individuals cannot hide from security cameras by presenting only a side view of their face because the system can transform the side views and generate a frontal pose.
[0010] In addition, the present techniques for face frontalization can be used in a host of security tasks including: border control; a national identity system; access control to devices, homes, cars, secured areas; or face-based analysis of age, sex, ethnicity, or facial expression. Overall, such systems can be used to control access to devices on a small scale as well as access to buildings or countries on a larger scale.
[0011] Further, in some implementations, the present techniques include the use of a separate frontalization layer for performing frontalization tasks in a deep convolutional network. Thus, frontalization and identity tasks are learned through representations in separate layers of the deep convolutional network, rather than being learned by a representation spanning several layers. The break-down into two subsequent tasks reduces the number of samples required for training because frontalization is largely independent of the individual face in the image, and improves the accuracy for the identity task because it can leverage frontal poses, which is an easier task than recognizing identity in non- frontal poses. In this case, working with fewer training samples also reduces the training time for deep convolutional networks. Moreover, the present techniques do not use landmarks for purposes of detection. Thus, the techniques described herein may use fewer training samples resulting in a reduced training time. Moreover, the techniques enable identification with higher accuracy because identification is performed using frontal poses.
[0012] Fig. 1 is a block diagram of an example deep convolutional network that includes a frontalization layer. The example deep convolutional network is generally referred to by the reference number 100 and can be implemented using the example computing device 302 of Fig. 3 below.
[0013] The example deep convolutional network 100 includes a plurality of functional layers. For example, the layers may include an image detection layer 102, a frontalization layer 104, a convolution layer 106, a bias layer 108, a non- linearity layer 1 10, a pooling layer 1 12, a loss calculation layer 1 14, and a label layer 1 16. In some examples, each layer of the deep convolutional network 100 may have hundreds to thousands of parameters that can be initialized and then adjusted through training.
[0014] The example deep convolutional network 100 generally provides for mapping of a non-frontal object pose to a frontal object pose from left to right in Fig. 1 , and a back projection of errors for training to adjust parameters using transforms that define the frontalization layer 104 from right to left. For example, the image detection layer 102 can detect an image. The image may include one or more objects with non-frontal poses such as faces. In some examples, the image detection layer 102 can send the image to the
frontalization layer 1 04 as indicated by an arrow 1 18 for frontalization of the faces. No arrow is shown from the frontalization layer 1 04 to the image detection layer 102, as no errors are back-projected to the input image.
[0015] The frontalization layer 104 receives a two-dimensional image from the image detection layer 102 as indicated by an arrow 1 1 8 and outputs a frontalized two-dimensional image as indicated by an arrow 120. For example, a face frontalization can be performed through a 3D rotation and 3D translation as described herein. The 3D rotation can be parameterized through
quaternions to avoid ambiguities within the rotation space that Euler angles have. A quaternion is a complex number of the form w + xi + yj + zk, where w, x, y, z are real numbers and i, j, k are imaginary units that satisfy certain conditions. This quaternion representation introduces four 3D rotation parameters, which can be learned. Additional parameters may be used for 3D translation of the face. For example, three additional parameters can be used to represent translation in the x, y, and z axes of a coordinate frame.
[0016] In some examples, seven parameters may thus be used for face frontalization. These seven parameters transform a face from any given pose into the frontal pose. In some examples, the frontalization layer can receive an image of a face in any pose as input and estimate the pose of this input face through the estimation of the seven parameters. These parameters describe a transform from the face in the image, which can appear in any pose, to a 3D face model that appears in a frontal pose. Furthermore, bilinear interpolation from the 3D face model into the 2D image space can be used to rasterize an image of the face in frontal pose. This rasterized 2D image is the output from the frontalization layer 104 as indicated by an arrow 120. In some examples, the frontalization layer 104 can also receive projected output loss feedback as indicated by an arrow 122. The operation of an example frontalization layer is discussed at greater length with respect to the example frontalization layer of Fig. 2 below.
[0017] The convolutional layer 106 reuses the same parameters within a spatial neighborhood. Reuse, as used herein, is defined as a convolutional operation between the parameters and input values. The parameters may be a set of variables that define the convolutional kernel. In the example of Fig. 1 , the input values for the convolutional layer 106 can come from the frontalized image that is received from the frontalization layer 104. The convolutional layer 106 can also receive values such as back-projected errors from the bias layer 108.
[0018] The bias layer 108 of the example deep convolutional network 1 00 computes an additive bias to each value. In some examples, the bias can be a free parameter adjusted through training.
[0019] The non-linearity layer 1 10 computes the hyperbolic tangents of all received values and has neither a spatial interaction nor any parameters. The hyperbolic tangent provides a non-linearity important to provide learning and generalization properties of a deep convolutional network. Instead of using the hyperbolic tangent rectified linear units (setting negative values to zero) can be used as well. However; it is important to have a non-linearity.
[0020] The pooling layer 1 1 2 combines values within a spatial neighborhood to form a single output. For example, the pooling layer 1 12 can use max- pooling, which computes the maximum of four spatially neighboring locations in an image as an output. Such max pooling can reduce the width and height of an image by a factor of two in width and height. The pooling layer 1 12 has no trainable parameters. However, the range of max-pooling and stride length are fixed parameters.
[0021] The loss calculation layer 1 14 computes an output loss to be projected back to the other layers. For example, the loss calculation layer 1 14 can compute a soft-max function between a ground-truth label received from the label layer 1 16 as indicated by an arrow 142 and a predicted label received from, e.g. the pooling layer 1 12 as indicated by an arrow 136, or, more generally, any other layer preceding the loss layer 1 12 as indicated by arrows 120, 124, 1 28, and 132, to generate the output loss. As used herein, a soft-max function refers to a generalization of the logistic function that "squashes" a K- dimensional vector z of arbitrary real values to a K-dimensional vector σ (z) of real values in the range (0, 1 ) that add up to 1 . As used herein, a label refers to a class identifier (CID) for a given data point. A class identifier, as used herein, refers to an identifier for a particular class of objects. A predicted label refers to the output of the deep convolutional network 100 given a particular input. A ground-truth label refers to a label supplied by a user. This output loss can then be projected back through layers 104-1 12 as indicated by arrows 122, 126, 1 30, 134, and 138. In particular, as discussed in detail with regard to Fig. 2 below, the frontalization layer 104 may receive the output loss and adjust its
parameters accordingly. In some examples, one or more other layers may also receive the output loss and either process the output loss or pass the output loss onto the next layer in the chain.
[0022] Thus, the label layer 1 16 may receive ground-truth labels from one or more users. For example, to classify bananas and apples from images, images of bananas and apples may be received and bananas may be assigned a CID of 0 and apples may be assigned a CID of 1 . For example, a user can label images as bananas or apples. A predicted label is the output of the deep convolutional network 100 given an image of a banana or apple as an input. For example, the output can be a CID of 0 or 1 .
[0023] The diagram of Fig. 1 is not intended to indicate that the example deep convolutional network 100 is to include all of the components shown in Fig. 1 . Rather, the example deep convolutional network 100 can include fewer or additional components not illustrated in Fig. 1 (e.g., additional layers, etc.) as indicated by dashed lines 140. Moreover, the diagram of Fig. 1 is not intended to indicate that the components of the example deep convolutional network 100 are to be arranged in any particular order. For example, the frontalization layer 104 is shown between the image detection layer 102 and the convolutional layer 106, but can be alternatively plugged into the deep convolutional network 100 between any two layers.
[0024] Fig. 2 is a block diagram showing an example frontalization layer. The frontalization layer is generally referred to by the reference number 200 and can again be implemented using the example computing device 302 of Fig. 3 below.
[0025] The frontalization layer 200 includes grid generator component 202 and a rasterization component 204. The grid generator component 202 can receive pose parameters Θ as indicated by an arrow 206 and a supplied 3D model M as indicated by an arrow 208 and generate a two-dimensional (2D) sample grid G. For example, the 3D model may be retrieved from a database and represent a mean face given a plurality of 3D point cloud reconstructions of thousands of faces. The 2D sample grid G can define the mapping of the 2D non-frontal pose into a 2D frontal pose of the object in the image. The 2D sample grid G can then sent to the rasterization component 204 as indicated by arrow 21 0. The rasterization component 204 can receive the non-frontalized object view in an 2D input image I as indicated by an arrow 21 2 and rasterize a frontalized object view in the 2D image Γ as indicated by an arrow 214. The above steps thus summarize the computations of the forward direction corresponding to left to right in Fig. 1 above. [0026] The backward direction of Fig. 2 indicated by arrows pointing from right to left can start with the rasterization component 204 receiving a differential image dl' as indicated by arrow 216. The differential image dl' may contain errors or differentials for the given frontalized object view and object
classification. For example, the differential image dl' can include a mismatch between the predicted label and ground-truth label. Thus, differential refers to the partial derivative of the output loss with regards to input values. In the example of Fig. 2, d(loss)/d(l') feeds into the frontalization layer and d(loss)/d(l) is output in reverse direction to the layer that feeds this frontalization layer. In some examples, through the use of partial derivatives, this differential image dl' can then be linked back to the used sampling grid G resulting in the differential sampling grid dG. In turn, the differential 2D sampling grid dG can be related back as indicated by an arrow 21 8 to differentials in the pose parameters Θ, indicated by the differentials d© 222. The differential frontalized 2D image dl' 21 6 can likewise be related back to a differential 2D image dl as indicated by an arrow 220. In some examples, the pose parameters Θ may only exist in the grid generator G, which is indicated in Fig. 2 by the connection of the inward 206 and outward 222 going arrows for pose parameters with a dashed line 224. For example, the pose parameters are internally used in the frontalization layer of the example deep convolution layer of Fig. 1 above.
[0027] Accordingly, a backward pointing arrow for the dl of the frontalization is absent in Fig. 1 . This is because the input image has no free parameters that need to be adjusted. In some examples, however, the frontalization layer could alternatively be at other positions in the chain of layers of a deep convolutional network. In those cases, a backward pointing arrow may be used to indicate the relation back of parameters.
[0028] In some examples, pose parameters can be learned through error minimization over training samples. For example, a given data set may include 10 individuals, with 25 poses for each individual. For example, the 25 poses may include 5 elevation angles each at 5 azimuth angles. Each individual may also be present 1 00 times in each pose. Thus, the training set may have a total of 10 x 25 x 100 = 25,000 samples. The frontalization layer can be initialized using the identity transform, assuming only frontal poses. Any non-frontal face that is presented to the network produces a large error for a given loss function on the output when compared to the frontal view. For example, the non-frontal face may have an azimuth angle of 10 degrees and an elevation angle 5 degrees. This error can be can be calculated at another layer and back- propagated through the deep convolutional network to correct the seven parameters in the frontalization layer. For example, the error can be calculated at the loss layer of Fig. 1 above. Thus, the parameters of the frontalization layer can be trained using any suitable training set. Moreover, the learning of individualized parameters for the transform can compensate for any
inaccuracies of the mean 3D face model.
[0029] Fig. 3 is a block diagram of a system that can frontalize objects detected within an image. The system is generally referred to by the reference number 300.
[0030] The system 300 may include a computing device 302, and one or more client computers 304, in communication over a network 306. As used herein, a computing device 302 may include a server, a personal computer, a tablet computer, and the like. As illustrated in Fig. 3, the computing device 302 may include one or more processors 308, which may be connected through a bus 31 0 to a display 312, a keyboard 314, one or more input devices 31 6, and an output device, such as a printer 31 8. The input devices 316 may include devices such as a mouse or touch screen. The processors 308 may include a single core, multiples cores, or a cluster of cores in a cloud computing architecture. In some examples, the processors 308 may include a graphics processing unit (GPU). The computing device 302 may also be connected through the bus 310 to a network interface card (NIC) 320. The NIC 320 may connect the computing device 302 to the network 306.
[0031] The network 306 may be a local area network (LAN), a wide area network (WAN), or another network configuration. The network 306 may include routers, switches, modems, or any other kind of interface device used for interconnection. The network 306 may connect to several client computers 304. Through the network 306, several client computers 304 may connect to the computing device 302. Further, the computing device 302 may access images across network 306. The client computers 304 may be similarly structured as the computing device 302.
[0032] The computing device 302 may have other units operatively coupled to the processor 308 through the bus 310. These units may include non- transitory, tangible, machine-readable storage media, such as storage 322. The storage 322 may include any combinations of hard drives, read-only memory (ROM), random access memory (RAM), RAM drives, flash drives, optical drives, cache memory, and the like. The storage 322 may include a store 324, which can include any images captured or generated in accordance with an
embodiment of the present techniques. Although the store 324 is shown to reside on computing device 302, a person of ordinary skill in the art would appreciate that the store 324 may reside on the computing device 302 or any of the client computers 304.
[0033] The storage 322 may include a plurality of modules 326. For example, the modules 326 may be a set of instructions stored on the storage device 322, as shown in Fig. 3. The instructions, when executed by the processor 308, may direct the computing device 302 to perform operations. In some examples, the instructions can be executed by a graphics processing unit (GPU). In some examples, the grid generator 328, rasterizer 330, and/or task performer 332 may be implemented as logic circuits or computer-readable instructions stored on an integrated circuit such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other type of processor. The grid generator 328 can receive an image of an object and a three-dimensional (3D) model of the object. For example, the 3D model can represent a mean object based on 3D point cloud representations of a plurality of objects. In some examples, the objects may be faces. The grid generator 328 can also estimate a pose of the object in the image based on estimation of a plurality of parameters. For example, the parameters can include four parameters describing a quaternion to represent a 3D rotation and three components of a vector representing a 3D translation. In some examples, the plurality of parameters describe a 3D rotation and 3D translation. For example, the plurality of parameters can be learned via error minimization over a plurality of training samples. In some examples, the grid generator 328 can generate a two-dimensional sample grid based on the estimated pose parameters. The rasterizer 330 can generate a frontalized image of the object based on the estimated pose and the 3D model of the object.
[0034] The task performer 332 can detect an object in another image based on a comparison with the frontalized image. In some examples, the task performer 332 can identify a person in another image based on the frontalized image. In some examples, the task performer 332 can verify that a person appears in another image based on the frontalized image. In some examples, the task performer can detect a frontalized face within an image. The client computers 304 may include storage similar to storage 322. For example, the storage may be the non-transitory, tangible computer-readable medium of Fig. 5 below.
[0035] Fig. 4 is a process flow diagram showing a method of generating frontalized images of objects. The example method is generally referred to by the reference number 400 and can be implemented using the processor 308 of the example system 300 of Fig. 3 above.
[0036] At block 402, the processor receives an image of an object and a three-dimensional (3D) model of the object. For example, the object can be the face of a person. The 3D model can be a face model. For example, the face model may represent a mean face based on 3D point cloud representations of a plurality of faces.
[0037] At block 404, the processor estimates a pose of the object in the image based on estimation of a plurality of parameters. For example, the plurality of parameters describe a 3D rotation and 3D translation and are to be learned via error minimization over a plurality of training samples. In some examples, the processor can receive a differential frontalized image and project back a differential sample grid to generate a plurality of differential pose parameters to be used to generate a set of new pose parameters. In some examples, the processor can receive a differential frontalized image and projecting back a differential image. For example, the differential frontalized image may be received from a layer of a deep convolutional network and the differential image sent to another layer of the deep convolutional network.
[0038] At block 406, the processor generates a frontalized image of the object based on the estimated pose and the 3D model of the object. For example, the processor can rasterize the frontalized image through a bilinear interpolation of the model of the object into a two-dimensional image space.
[0039] At block 408, the processor detects, identifies, or verifies an object based on the frontalized image. For example, the processor can detect a face in an image based on the frontalized image. In some examples, the processor can identify a person in an image based on the frontalized image. In some examples, the processor can verify that a person appears in an image based on the frontalized image.
[0040] This process flow diagram is not intended to indicate that the blocks of the example method 400 are to be executed in any particular order, or that all of the blocks are to be included in every case. Further, any number of additional blocks not shown may be included within the example method 400, depending on the details of the specific implementation.
[0041] Fig. 5 is a block diagram showing a non-transitory, tangible computer- readable medium that stores code for frontalization. The non-transitory, tangible computer-readable medium is generally referred to by the reference number 500.
[0042] The non-transitory, tangible computer-readable medium 500 may correspond to any storage device that stores computer-implemented
instructions, such as programming code or the like. For example, the non- transitory, tangible computer-readable medium 500 may include one or more of a non-volatile memory, a volatile memory, and/or one or more storage devices.
[0043] Examples of non-volatile memory include, but are not limited to, electrically erasable programmable read only memory (EEPROM) and read only memory (ROM). Examples of volatile memory include, but are not limited to, static random access memory (SRAM), and dynamic random access memory (DRAM). Examples of storage devices include, but are not limited to, hard disks, compact disc drives, digital versatile disc drives, and flash memory devices.
[0044] A processor 502 generally retrieves and executes the computer- implemented instructions stored in the non-transitory, tangible computer- readable medium 500 for frontalization of faces in images. A grid generator module 504 can receive an image of a face and a three-dimensional (3D) face model. In some examples, the module 504 can estimate a pose of the face in the image based on estimation of a plurality of parameters. The plurality of parameters can be learned via error minimization over a plurality of training samples. For example, the plurality of parameters may describe a 3D rotation and 3D translation. In some examples, the 3D rotation parameters may be four quaternions.
[0045] A rasterizer module 506 can generate a frontalized image of the face based on the estimated pose and the 3D face model. In some examples, the rasterizer module 506 can receive a differential frontalized image and project back a differential sample grid to generate a plurality of differential pose parameters to be used to generate a set of new pose parameters. For example, the new pose parameters can be used to generate an updated frontalized image. In some examples, the rasterizer module 506 can rasterize the frontalized image via a bilinear interpolation of the 3D face model into a two- dimensional image space. In some examples, the rasterizer module 508 can receive a differential frontalized image and project back a differential image.
[0046] A task module 508 can detect a face in another image based on the frontalized image. For example, the face can be detected in a particular portion of the other image. In some examples, the task module 508 can also identify a person in another image based on the frontalized image. For example, given a particular person's face stored in a database of frontalized images, the same person's face can be identified in additional images based on the frontalized image. In some examples, the task module 508 can verify that a person appears in another image based on the frontalized image. For example, the task module 508 can be used to compare faces having different poses in two images. [0047] Although shown as contiguous blocks, the software components can be stored in any order or configuration. For example, if the computer-readable medium 500 is a hard drive, the software components can be stored in noncontiguous, or even overlapping, sectors.
[0048] The present techniques are not restricted to the particular details listed herein. Indeed, those skilled in the art having the benefit of this disclosure will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present techniques.
Accordingly, it is the following claims including any amendments thereto that define the scope of the present techniques.

Claims

CLAIMS What is claimed is:
1 . A method for generating pose frontalized images of objects, comprising: receiving an image of an object and a three-dimensional (3D) model of the object;
estimating, via a processor, a pose of the object in the image based on
estimation of a plurality of parameters, wherein the plurality of parameters describe a 3D rotation and 3D translation and are to be learned via error minimization over a plurality of training samples; and generating, via the processor, a frontalized image of the object based on the estimated pose and the 3D model of the object.
2. The method of claim 1 , wherein learning the plurality of parameters further comprises receiving a differential frontalized image and projecting back a differential sample grid to generate a plurality of differential pose parameters to be used to generate a set of new pose parameters.
3. The method of claim 1 , wherein generating the frontalized image further comprises rasterizing the frontalized image via a bilinear interpolation of the model of the object into a two-dimensional image space.
4. The method of claim 1 , further comprising receiving a differential frontalized image and projecting back a differential image.
5. The method of claim 1 , further comprising detecting a face in another image based on the frontalized image, identifying a person in another image based on the frontalized image, verifying that a person appears in another image based on the frontalized image, or any combination thereof.
6. A system for generating pose frontalized images of objects, comprising: a grid generator to receive an image of an object and a three-dimensional (3D) model of the object and estimate a pose of the object in the image based on estimation of a plurality of parameters, wherein the plurality of parameters describe a 3D rotation and 3D translation and are to be learned via error minimization over a plurality of training samples; and
a rasterizer to generate a frontalized image of the object based on the estimated pose and the 3D model of the object.
7. The system of claim 6, wherein the grid generator is to further generate a two-dimensional sample grid based on the estimated pose parameters.
8. The system of claim 6, further comprising a task performer to detect an object in another image based on a comparison with the frontalized image, identify a person in another image based on the frontalized image, verify that the person appears in another image based on the frontalized image, or any combination thereof.
9. The system of claim 6, wherein the 3D model represents a mean object based on 3D point cloud representations of a plurality of objects.
10. The system of claim 6, wherein the parameters comprise four
quaternions representing the 3D rotation and three components of a vector representing the 3D translation.
1 1 . A non-transitory, tangible computer-readable medium, comprising code to direct a processor to:
receive an image of a face and a three-dimensional (3D) face model; estimate a pose of the face in the image based on estimation of a
plurality of parameters, wherein the plurality of parameters describe a 3D rotation and 3D translation and are to be learned via error minimization over a plurality of training samples; and generate a frontalized image of the face based on the estimated pose and the 3D face model.
12. The non-transitory, tangible computer-readable medium of claim 1 1 , further comprising code to direct the processor to receive a differential frontalized image and project back a differential sample grid to generate a plurality of differential pose parameters to be used to generate a set of new pose parameters.
13. The non-transitory, tangible computer-readable medium of claim 1 1 , further comprising code to direct the processor to rasterize the frontalized image via a bilinear interpolation of the 3D face model into a two-dimensional image space.
14. The non-transitory, tangible computer-readable medium of claim 1 1 , further comprising code to direct the processor to receive a differential frontalized image and project back a differential image.
15. The non-transitory, tangible computer-readable medium of claim 1 1 , further comprising code to direct the processor to detect the face in another image based on the frontalized image, identify a person in another image based on the frontalized image, verify that a person appears in another image based on the frontalized image, or any combination thereof.
PCT/US2016/015181 2016-01-27 2016-01-27 Generating pose frontalized images of objects Ceased WO2017131672A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2016/015181 WO2017131672A1 (en) 2016-01-27 2016-01-27 Generating pose frontalized images of objects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2016/015181 WO2017131672A1 (en) 2016-01-27 2016-01-27 Generating pose frontalized images of objects

Publications (1)

Publication Number Publication Date
WO2017131672A1 true WO2017131672A1 (en) 2017-08-03

Family

ID=59399078

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/015181 Ceased WO2017131672A1 (en) 2016-01-27 2016-01-27 Generating pose frontalized images of objects

Country Status (1)

Country Link
WO (1) WO2017131672A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229313A (en) * 2017-11-28 2018-06-29 北京市商汤科技开发有限公司 Face identification method and device, electronic equipment and computer program and storage medium
CN111046707A (en) * 2018-10-15 2020-04-21 天津大学青岛海洋技术研究院 Face restoration network in any posture based on facial features
CN111445581A (en) * 2018-12-19 2020-07-24 辉达公司 Mesh reconstruction using data-driven priors
CN111598998A (en) * 2020-05-13 2020-08-28 腾讯科技(深圳)有限公司 Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium
US12475367B2 (en) 2018-05-09 2025-11-18 Beemotion.Ai Ltd Image processing system for extracting a behavioral profile from images of an individual specific to an event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090309878A1 (en) * 2008-06-11 2009-12-17 Sony Corporation Image processing apparatus and image processing method
KR20110088361A (en) * 2010-01-26 2011-08-03 한국전자통신연구원 Front face image generating device and method
WO2013187551A1 (en) * 2012-06-11 2013-12-19 재단법인 실감교류인체감응솔루션연구단 Three-dimensional video conference device capable of enabling eye contact and method using same
US20150161435A1 (en) * 2013-12-05 2015-06-11 Electronics And Telecommunications Research Institute Frontal face detection apparatus and method using facial pose

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090309878A1 (en) * 2008-06-11 2009-12-17 Sony Corporation Image processing apparatus and image processing method
KR20110088361A (en) * 2010-01-26 2011-08-03 한국전자통신연구원 Front face image generating device and method
WO2013187551A1 (en) * 2012-06-11 2013-12-19 재단법인 실감교류인체감응솔루션연구단 Three-dimensional video conference device capable of enabling eye contact and method using same
US20150161435A1 (en) * 2013-12-05 2015-06-11 Electronics And Telecommunications Research Institute Frontal face detection apparatus and method using facial pose

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAL HASSNER ET AL.: "Effective Face Frontalization in Unconstrained Images", ARXIV:1411.7964VL [CS.CV, 28 November 2014 (2014-11-28), pages 1 - 10, XP032793884, Retrieved from the Internet <URL:http://arxiv.org/pdf/1411.7964.pdf> *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229313A (en) * 2017-11-28 2018-06-29 北京市商汤科技开发有限公司 Face identification method and device, electronic equipment and computer program and storage medium
CN108229313B (en) * 2017-11-28 2021-04-16 北京市商汤科技开发有限公司 Face recognition method and apparatus, electronic device, computer program, and storage medium
US12475367B2 (en) 2018-05-09 2025-11-18 Beemotion.Ai Ltd Image processing system for extracting a behavioral profile from images of an individual specific to an event
CN111046707A (en) * 2018-10-15 2020-04-21 天津大学青岛海洋技术研究院 Face restoration network in any posture based on facial features
CN111445581A (en) * 2018-12-19 2020-07-24 辉达公司 Mesh reconstruction using data-driven priors
US11995854B2 (en) 2018-12-19 2024-05-28 Nvidia Corporation Mesh reconstruction using data-driven priors
CN111598998A (en) * 2020-05-13 2020-08-28 腾讯科技(深圳)有限公司 Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium
CN111598998B (en) * 2020-05-13 2023-11-07 腾讯科技(深圳)有限公司 Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US10380788B2 (en) Fast and precise object alignment and 3D shape reconstruction from a single 2D image
Chen et al. Monorun: Monocular 3d object detection by reconstruction and uncertainty propagation
US10769411B2 (en) Pose estimation and model retrieval for objects in images
Murthy et al. Reconstructing vehicles from a single image: Shape priors for road scene understanding
EP3417425B1 (en) Leveraging multi cues for fine-grained object classification
US20140043329A1 (en) Method of augmented makeover with 3d face modeling and landmark alignment
US20190147245A1 (en) Three-dimensional object detection for autonomous robotic systems using image proposals
KR102252439B1 (en) Object detection and representation in images
US12019706B2 (en) Data augmentation for object detection via differential neural rendering
WO2017131672A1 (en) Generating pose frontalized images of objects
WO2022070184A1 (en) System and method for visual localization
CN114359377B (en) A real-time 6D pose estimation method and computer-readable storage medium
Hsiao et al. Flat2layout: Flat representation for estimating layout of general room types
EP4150577A1 (en) Learning articulated shape reconstruction from imagery
Wei et al. Rgb-based category-level object pose estimation via decoupled metric scale recovery
Yang et al. Learning to reconstruct 3d non-cuboid room layout from a single rgb image
WO2020197494A1 (en) Place recognition
Zhang et al. Real time feature based 3-d deformable face tracking
US20250118102A1 (en) Query deformation for landmark annotation correction
Fang et al. MR-CapsNet: a deep learning algorithm for image-based head pose estimation on CapsNet
Tal et al. An accurate method for line detection and manhattan frame estimation
Linåker et al. Real-time appearance-based Monte Carlo localization
Hasan et al. 2D geometric object shapes detection and classification
Turmukhambetov et al. Modeling object appearance using context-conditioned component analysis
WO2017042852A1 (en) Object recognition appratus, object recognition method and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16888404

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16888404

Country of ref document: EP

Kind code of ref document: A1