WO2024211884A1 - Changement d'éclairage de visage d'avatars avec balayage de haute qualité et capture mobile - Google Patents
Changement d'éclairage de visage d'avatars avec balayage de haute qualité et capture mobile Download PDFInfo
- Publication number
- WO2024211884A1 WO2024211884A1 PCT/US2024/023569 US2024023569W WO2024211884A1 WO 2024211884 A1 WO2024211884 A1 WO 2024211884A1 US 2024023569 W US2024023569 W US 2024023569W WO 2024211884 A1 WO2024211884 A1 WO 2024211884A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- subject
- images
- view
- light source
- pose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/506—Illumination models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
Definitions
- the present disclosure is related generally to the field of generating three- dimensional computer models of subjects in a video capture. More specifically, the present disclosure is related to generating relightable three-dimensional computer models of human faces for use in virtual reality, augmented reality, and mixed reality (VR/AR/MR) applications.
- VR/AR/MR mixed reality
- Animatable photorealistic digital humans are a key component for enabling social telepresence, with the potential to open a new way for people to connect while unconstrained to space and time.
- Relighting an image e.g., an image of the face of a subject
- the source of light can be moved up, down, left or right, the direction of the illumination can be changed and parameters of the lighting including the color and intensity of the lighting can be altered.
- a majority of the relighting applications rely on the LightStage dataset, where the shape and material are captured under synchronized cameras and the light from multiple light sources, including from some recent papers on single image relighting methods.
- a system comprising: a mobile device operable to generate a mobile capture of a subject; a plurality of cameras configured to provide a multi-view scan of the subject under a uniform illumination; and a pipeline configured to perform a plurality of processes using the mobile capture and the multi-view scan to generate a relightable avatar, wherein the mobile capture includes a video captured while the subj ect is moved relative to a light source.
- the plurality of cameras are fixed around the subject, and the uniform illumination may be provided by a plurality of light sources.
- the plurality of cameras may be configured to simultaneously take images of the multi-view scan.
- the images of the multi-view scan may comprise a coarse geometry' of a face including at least eyes, a nose and a mouth of the subject, and hairs of the subject.
- the pipeline may comprise a first processing stage configured to determine at least a reflectance, a pose and lighting parameters based on the mobile capture and the multi-view scan.
- the pipeline may further comprise: a second processing stage configured to generate a relightable model of a head of the subject based on the reflectance, the pose and the lighting parameters.
- the pipeline may further comprise: a differentiable Tenderer configured to combine the reflectance, the pose and the lighting parameters with images of the multi-view scan to provide a rendered image.
- a differentiable Tenderer configured to combine the reflectance, the pose and the lighting parameters with images of the multi-view scan to provide a rendered image.
- the pose may comprise a camera pose and a head pose.
- the camera pose may comprise a first distance between the mobile device and a fixed point, and wherein the head pose comprises a second distance between the mobile device and the fixed point.
- the light source may comprise a point light source.
- a method comprising: retrieving a plurality 7 of stage images including a plurality of views of a subject; retrieving a plurality of self-images of the subject by using a mobile device while the subject is being moved with respect to a point light source; and generating a three- dimensional (3D) mesh of a head of the subject based on the stage images.
- the method may further comprise generating a texture map for the subject based on the stage images and the self-images.
- the texture map may comprise a view-dependent and illumination-dependent texture map.
- the method may further comprise generating, based on the texture map and the 3D mesh, a view of the subject illuminated by a synthetic light source, and wherein the synthetic light source is associated with an environment in an immerse reality (IR) application.
- IR immerse reality
- the method may further comprise providing the view of the subject to the IR application running on a headset.
- a method comprising: retrieving a plurality of images of a subject from a plurality of view directions; forming a plurality of synthetic views of the subject for each view direction of the plurality of view directions; and training a model with the plurality of images of the subject and the plurality' of synthetic views of the subject.
- Retrieving the plurality of images of the subject may be under a plurality of illumination configurations.
- Forming the plurality 7 of synthetic views of the subject may be further for each illumination configuration of the plurality of illumination configurations.
- the method may further comprise using a mobile device to capture at least some of the plurality of images of the subject from the plurality of view directions using a single point light source.
- the method may further comprise using a plurality 7 of cameras and a plurality 7 of light sources to provide a uniform illumination to capture at least some of the plurality of images of the subject.
- Another aspect of the disclosure is related to a method including retrieving multiple stage images including several views of a subject and retrieving multiple self-images of the subject by using a mobile device while the subject is being moved with respect to a point light source.
- the method further includes generating a 3D mesh of a head of the subject based on the stage images.
- Yet another aspect of the disclosure is related to a method including retrieving multiple images of a subject from several view directions and forming multiple synthetic views of the subject for each view direction. The method further includes training a model with the multiple images of the subject and the multiple synthetic views of the subject.
- FIG. 1 is a schematic diagram illustrating an example of an architecture for use of a VR headset running an immersive application, according to some aspects of the subject technology 7 .
- FIG. 2 is a schematic diagram illustrating an example of a pipeline in the process of generating a relightable avatar, according to some aspects of the subject technology.
- FIG. 3 is a schematic diagram illustrating an example of an architecture of a relightable avatar model, according to some aspects of the subject technology 7 .
- FIG. 4 is a schematic diagram illustrating an example of images from mobile capture videos for use as input to generate a relightable avatar model, according to some aspects of the subject technology 7 .
- FIG. 5 is a schematic diagram illustrating an example of a texture map and a three- dimensional mesh of a subject face obtained from a multi-camera setup, according to some aspects of the subject technology 7 .
- FIG. 6 is a schematic diagram illustrating an example of a head pose estimation process to generate a relightable avatar model, according to some aspects of the subject technology 7 .
- FIG. 7 is a schematic diagram illustrating an example of a lighting estimation process to render a neutral representation for a relightable avatar model, according to some aspects of the subject technology 7 .
- FIGS. 8A and 8B are schematic diagrams illustrating examples of an irradiance map and a captured scene to determine an illumination direction, according to some aspects of the subject technology.
- FIG. 9 is a schematic diagram illustrating an example of a light source intensity verification process in a relightable avatar model, according to some aspects of the subject technology 7 .
- FIG. 10 is a schematic diagram illustrating an example of an illumination direction verification process in a relightable avatar model, according to some aspects of the subject technology.
- FIGS. 11 A, 1 IB and 11C are schematic diagrams illustrating an example of settings of environment colors for a relightable avatar model, according to some aspects of the subject technology.
- FIG. 12 is a schematic diagram illustrating an example of a relightable avatar rendered from a mobile phone video capture, according to some aspects of the subject technology.
- FIG. 13 is a flow diagram illustrating a method for providing relightable avatars to immersive reality (IR) applications for headset users.
- IR immersive reality
- FIG. 14 is a flow diagram illustrating an example method for training a relightable avatar model for use in IR applications, according to some aspects of the subject technology.
- another mobile capture with an existing high-quality MVS scan system is leveraged to achieve the relighted dataset. This is done by augmenting the MVS data under uniform lighting.
- the disclosed technique further captures indoor mobile video under a point light source to solve for the person-specific reflectance.
- Photorealistic avatars are becoming a trend for IR applications.
- One of the challenges presented is the accurate immersion of the photorealistic avatar in an arbitrary illumination setting, preserving high fidelity with a specific human face. Both the geometry and the texture of a human face is seamlessly reproduced under several illumination conditions.
- Current techniques tend to invest excessive time in training a model for relighting an avatar by using large numbers of image captures under multiple lighting configurations. As a result, the training process can be very long, given the large number of input configurations adopted. As a result, the model itself tends to exhaust the computational capability of typical systems used in IR applications.
- relighting avatar models rely on multi-view stage collected data, where the shape and material are captured under synchronized cameras and lights. How ever, these methods are limited in the real case, as the personalized reflectance is unknown, and building a system with lighting variations is cumbersome.
- the reconstructed mesh from the MVS system has good accuracy but is captured in uniform lighting.
- Neural networks for a 3D face reconstruction model trained under uniform lighting might not generalize w ell to real indoor images.
- the disclosed method uses an input from a multiple-camera collection session of a subject under uniform illumination of a neutral gesture. This is complemented with a mobile video scan of the same subject rotating with a fixed, neutral expression in a closed room environment including at least one light source.
- the method of the subject technology extracts fine texture and color information from the collection session and combines this information with the mobile video scan to feed multiple views of the subj ect w ith a variable light source orientation for training a neural netw ork algorithm.
- the algorithm corrects for camera orientation and location, and environmental interferences (e.g., miscellaneous object shadows on the subject’s face) in the video scan, to provide an accurate, yet simple to train algorithm for immersing a subject avatar in a synthetic environment.
- the images of the subject are first captured with an MVS system under good, uniform lighting conditions.
- the MVS scan enables determining a good face geometry and albedo.
- the mobile scan videos are captured under a single point light source (e.g., a common floor lamp).
- the relightable model is found by solving for lighting parameters and reflectance for the mobile capture in addition to identifying the head pose in the mobile videos.
- lighting parameters include a light direction and distance, intensify, and a global environment map.
- the system samples colors in a unit sphere for rendered pixels.
- Reflectance parameters include materials properties such as specular intensify and specular roughness.
- the neural network is trained to identify focal length (intrinsic camera parameter) and extrinsic camera parameters including head pose rotation, head pose translation, camera translation, global pixel scale, and updating directions of environment map and sun direction. Camera rotations are captured from mobile captures.
- the neural network training includes loss functions such as Landmarking loss (e.g., key point selection on collected images).
- a loss function is the Euclidean distance between projected points and a corresponding point in a ground truth (e.g., collected) image.
- Some embodiments include a photometric loss as a two-dimensional norm between rendered images and original images, after binary masks eliminate background and hair textures, e.g., to select a face region only.
- the face relighting technique of the subject technology can advantageously be used in various applications including AR, VR and IR devices to enhance device performance. Further, the application of the subject technology can improve existing models that tend to suffer from quality issues and artifacts, and provides an accurate, yet simple to train algorithm for immersing a subject avatar in a synthetic environment.
- FIG. 1 is a schematic diagram illustrating an example of an architecture 100 for use of a VR headset running an immersive application, according to some aspects of the subject technology.
- the VR headset 110 includes a display 112, a frame 114, an IR camera 116. a processor 118. , a memory 120, and a communications module 122.
- the display 112 supports eyepieces, at least one of which includes a VR display.
- the memory circuit 120 stores instructions which, when executed by the processor circuit 118. cause the VR headset 110 to perform methods and processes of the subject technology, as disclosed herein.
- the memory circuit 120 includes the immersive application hosted by a remote server 150, which is coupled to a database 160 via a network 140.
- the communications module 122 enables the VR headset 110 to communicate Data-1 wirelessly with a mobile phone 130 (also referred to as a mobile device), via short-range communications (e.g., via Bluetooth, low energy -BLE-, Wi-Fi, near field communication -NFC- and the like). Further, the communications module 122 can enable communications with the remote server 150 or the database 160, via the netw ork 140 (e.g., Data-2 and Data-3). In some implementations, the communication with the server 150 and the database 160 can take place with the help of the mobile phone 130. Accordingly, the VR headset 110 may exchange Data-1 with the mobile phone 130, and Data-2 and Data-3 may be communicated between the mobile phone 130, the server 150, the database 160 and the network 140.
- a mobile phone 130 also referred to as a mobile device
- short-range communications e.g., via Bluetooth, low energy -BLE-, Wi-Fi, near field communication -NFC- and the like.
- the communications module 122 can enable communications with the remote server 150 or
- a user 102 of the VR headset 110 may collect a self-video scan while moving relative to a light source using the mobile phone 130.
- the mobile phone 130 then provides the self-scan of the user (Data-2) to the remote server 150.
- the database 160 may include multiple images of the user 102 or a subject (Data-3) collected during a session in a multi-camera, multi-view stage.
- the remote server 150 may also use the stage images and the self-images from the user or the subject to generate a relightable avatar model of the user 102 or subject.
- the relightable avatar is then provided to the immersive application running in the VR headset 110 of the user 102 and other participants in an IR experience.
- Data-1, Data-2, or Data-3 may include a relightable avatar of the user 102 of the VR headset 110 and/or other participants in the IR application. Accordingly, the VR headset 110 receives the relightable avatar and projects it on the display 112. In one or more implementations, the relightable avatar is generated within the VR headset 110 via the processor circuit 118 executing instructions stored in the memory circuit 120. The instructions may include steps in algorithms and processes as disclosed herein. In some embodiments, the VR headset 110 may provide the relightable avatar model (e.g., Data- 1) to the mobile phone 130 or remote server 150 (Data-3), which in turn distributes the relightable avatar associated with the VR headset 110 with other participants in the IR application.
- Data-1, Data-2, or Data-3 may include a relightable avatar of the user 102 of the VR headset 110 and/or other participants in the IR application. Accordingly, the VR headset 110 receives the relightable avatar and projects it on the display 112. In one or more implementations, the relightable
- FIG. 2 is a schematic diagram illustrating an example of a pipeline 200 in the process of generating a relightable avatar, according to some aspects of the subject technology.
- the pipeline 200 can generate a relightable avatar that is accurate and can be implemented with reasonable computational capabilities.
- the pipeline 200 combines a mobile capture 210 of a user 202 and a multi-view, high-quality stage scan 212 (hereinafter, muti-view scan 212) of a subject.
- the mobile capture 210 generated by a mobile scan collects images 214 of the subject (e.g., the subject 202) while the subject moves relative to a point light source (e.g., a lamp, a flashlight, the sun), within a closed environment or outdoors.
- a point light source e.g., a lamp, a flashlight, the sun
- the multi -view scan 212 generates multiple images that are used by a processor to form a 3D mesh 216 of the subject’s head under a (fixed) uniform lighting configuration.
- the multi-view scan 212 is performed simply by simultaneously taking a single picture by each of multiple cameras under a uniform lighting condition provided by several light sources and is significantly simpler than the existing MVS systems.
- the multi-view scan 212 can capture more details compared to the existing systems. For example, the multiview scan 212 can capture a coarse geometry of the face (e.g., eyes, nose, mouth, etc.) and the hair of the subject.
- the pipeline 200 at a first processing stage 218, solves for parameters such as reflectance, lighting, and pose.
- the relighting application 220 generates a relightable model of the subject’s head.
- the stage inputs 216 enable an accurate representation of the subject’s head geometry and albedo (e.g., reflectance of each point in the user’s head under uniform illumination conditions).
- the processing stage 218 may include using a neural network to define the parameters (reflectance, lighting, and pose).
- the relightable model is configured to estimate lighting and reflectance with few coupled parameters using the mobile capture 210 where different lighting conditions are tested by having the subject moving relative to a point light source.
- the relightable model is obtained using a neural network approach to resolve pose, lighting, and reflectance attributes of synthetic views of the subject by finding appropriate loss functions to optimize such attributes based on the images collected (e.g., a ground-truth baseline).
- the disclosed technique is less complex as it does not need to solve for geometry and albedo as existing solutions do.
- the use of the multi-view scan 212 provides better accuracy and makes the estimation of lighting and reflectance easier. It is noted that the images of both the multi-view scan 212 and the mobile capture 210 are taken with the same facial expression (e.g., neutral) of the subject.
- FIG. 3 is a schematic diagram illustrating an example of an architecture 300 of a relightable avatar model, according to some aspects of the subject technology-.
- a first input 310 from a mobile capture of a subject moving relative to a fixed light source is received.
- a second input 350 includes multiple, multi-view, high-quality scans of the same subject.
- the second input 350 may be collected in a staged capture event at a specially designed studio that includes multiple cameras directed at the subject from multiple directions.
- An encoder 320 uses the first input 310 and determines a camera pose 322 (e.g., a first distance between the camera and a fixed point) and lighting conditions 324 (e.g., light intensity, color, and geometry), which is processed to generate an environment map and point light sources 330.
- the encoder 320 further determines a reflectance 326 of the subject’s face and measures a head pose 328 (e.g., second distance between the head and the fixed point) thereof.
- the reflectance 326 is the reaction of the face to the incident light and the surrounding environment and the reflectance for each pixel would be different for different faces.
- the reflectance 326 is used in a reflection model 340, for example, a Blinn-Phong model, which is an empirical model of the local illumination of points on a surface.
- a differentiable Tenderer 360 combines encoded inputs resulting from processing of the first input 310 (by encoder 320) with the second input 350 to provide a rendered image 370.
- the differentiable Tenderer 360 uses loss functions wherein landmark points, mask profiles, and picture (e.g., color and texture) values are compared between the model and the ground truth images (e.g., from the first and second inputs).
- FIG. 4 is a schematic diagram illustrating example images 400 from mobile capture videos for use as input to generate a relightable avatar model, according to some aspects of the subject technology.
- the images 400 include images 410, 420 and 430, which are pictures of the subject taken at different illumination conditions.
- the image 410 shows a picture of the subject with the fixed light source to the right of the subject.
- the image 420 shows a picture of the subject where the user has rotated relative to the light source, which now forms a 45° to the front/right profile.
- the image 430 shows a picture of the subject where the light source is located to the left of the subject.
- each of the images 410, 420 and 430 are video images taken as the subject rotates.
- the shades of the facial features have different format in each of the three illumination conditions shown in the images 410, 420 and 430.
- This format is formed by the head geometry, the position and distance of the light source relative to the subject (including the subject’s head pose). Accordingly, the relightable avatar model is trained using the encoder 320 of FIG. 3 to leam the reflectance, color, and texture of each portion of the subject’s face for the different illumination conditions and to predict these features for arbitrary illumination conditions.
- FIG. 5 is a schematic diagram illustrating an example of a texture map 520 and a 3D mesh 530 of a subject face obtained from a multi-camera setup, according to some aspects of the subject technology.
- the images of the subject face obtained from the multi-camera setup 510 are used to capture stage images of the subject.
- the texture map 520 includes color and reflectance of each portion of the subject’s face, and the 3D mesh 530 accurately reflects the geometry of the face. Overlaying the texture map 520 on the 3D mesh 530 results in the rendered image, which has a neutral gesture of the subject under uniform illumination conditions, which are the illumination conditions used in the multi-camera stage capture.
- FIG. 6 is a schematic diagram illustrating an example of a head pose-estimation process 600 to generate a relightable avatar model, according to some aspects of the subject technology.
- the head pose-estimation process 600 includes a first process 610 and a second process 620.
- the first process 610 is a 2D key-point extraction stage that can identify and select a number of key points 616 on the subject’s face 612 based on the 2D input image 602.
- the second process 620 selects additional key points 626 similar to the key points 616 from a 3D mesh 622 as shown on the image 624 with projected key points 626.
- a neural network model is trained to determine a pose estimation (P x 3dl R, T) based on a rotation triad R(9x, 0y, 9z) and a translation vector T(tx, ty, tz).
- a landmark loss function L !mk estimates the difference in position between the key points 626 in the 3D mesh and the corresponding key points 616 in the 2D input image (ground truth). The R and T parameters are adjusted to minimize the landmark loss to obtain the head pose.
- the landmark loss function L tmk is given as:
- FIG. 7 is a schematic diagram illustrating an example of a lighting estimation process 790 to render a neutral representation 749 for a relightable avatar model, according to some aspects of the subject technology.
- the lighting estimation process 709 includes a poseestimation stage 719 and a lighting-related process 729.
- the pose estimation stage 710 renders a head pose 702 with parameters R and T (cf. above).
- a lighting-related process 720 determines directional light parameters and reflectance parameter values based on the 3D mesh scan for each point in the subject’s image 712 (which is a 2D projection of the subject’s avatar).
- a differentiable rendering function 730 compares a color and intensify of each pixel in the rendered image with the input image (ground truth) using a loss function.
- an illumination direction and intensify are determined.
- a neural network e.g., deep neural network
- a relightable model is trained to render an image of the subject having a selected pose and illumination from a selected direction at a selected intensify, and with a selected source color or spectrum.
- the loss function 759 is defined as: where M(x, y) represents the mask 754 and /(%, y) and R(x, y) represent the input image 752 and the rendered image 756, respectively.
- FIGS. 8A and 8B are schematic diagrams illustrating examples of an irradiance map 899A and a captured scene 890B to determine an illumination direction, according to some aspects of the subject technology.
- the irradiance map 800A illustrates different illumination directions used as inputs to a loss function (cf. Eq. 2).
- the illumination direction is determined from self-images captured with a mobile phone to generate a relightable avatar model.
- the captured scene 800B depicts an environment captured to sample an environment color map from images captured with a mobile phone.
- an incoming illumination vector from the irradiance map 800A is selected to match the scene radiance shown in the captured scene 800B.
- FIG. 9 is a schematic diagram illustrating an example of a light source intensity verification process in a relightable avatar model, according to some aspects of the subject technology.
- the input image 910 shows selected key points 912 for estimating a loss function.
- the avatar models are relighted by using the irradiance of the sun as a model, with different degrees of intensity' increasing by proportional amounts from left to right.
- the sun intensity levels for the rendered images 920, 930 and 940 are 0.5, 1.0 and 1.5, respectively.
- the intensity level that minimizes the loss function for the identified key points is then selected. Accordingly, the relightable model is trained to produce the corresponding avatar for a sun irradiance in the selected intensity level.
- FIG. 10 is a schematic diagram illustrating an example of an illumination direction verification process 1000 in a relightable avatar model, according to some aspects of the subject technology.
- the illumination direction verification process 1000 includes moving a point source around the avatar’s face until a loss function for selected key points 1012 in the collected image 1010 (ground truth) is minimized.
- the lighting shown in the image 1020 is from the point source that minimizes the loss function.
- FIGS. 11 A, 1 IB and 11C are schematic diagrams illustrating an example of processes for setting environment colors for a relightable avatar model, according to some aspects of the subject technology.
- the environment colors are a projection on the subject’s face of the predominant colors in a scene laid out in front of the subject.
- FIG. 11 A illustrates an environment color map 1110, a 2D pixelated field 1120 and rendered image 1130.
- the environment color map 1110 is a simple panel with two colors (e.g., red and green) bisecting horizontally the plane of the subject’s image.
- the scene colors can be displayed as the 2D pixelated field partitioned in areas where one color is predominant.
- the relightable model projects the 2D pixelated field 1120 onto a neutral texture map (e.g., a 2D, pixelated map) and overlays the colored texture map on the 3D mesh.
- the rendered image 1130 shown is a 2D projection of the resulting 3D mesh.
- the environment color map 1140 is a simple panel with two colors (e.g., purple and yellow) bisecting horizontally the plane of the subject’s image.
- the scene colors can be displayed as the 2D pixelated field partitioned in areas where one color is predominant.
- the relightable model projects the 2D pixelated field 1150 onto a neutral texture and overlays the colored texture map on the 3D mesh.
- the rendered image 1160 shown is a 2D projection of the resulting 3D mesh.
- FIG. 11C illustrates more complex environment color maps 1100C that correspond to different scenes. Shown images 1162, 1164, 1166, 1168 and 1170 are relighted versions of the image 1160, and are associated with a city at noon, at sunrise, at sunset, inside a studio and at night looking through a window, respectively.
- FIG. 12 is a schematic diagram 1200 illustrating an example of a relightable avatar 1220 rendered from a subject’s image 1221 of a mobile phone video capture, according to some aspects of the subject technology.
- FIG. 13 is a llow diagram illustrating a method 1300 for providing relightable avatars to IR applications for headset users.
- a processor circuit e.g., 118 of FIG. 1 reading instructions from a memory' circuit (e.g., 120 of FIG. 1).
- the processor circuit and the memory' circuit may be in a VR headset (e g., 110 of FIG. 1), a remote server (e.g., 150 of FIG. 1), a mobile phone (e.g., 130 of FIG. 1) and/or a database (e.g., 160 of FIG. 1), as disclosed herein.
- the VR headset, remote server, mobile phone and database may be communicatively coupled via a network (e.g., 140 of FIG. 1), by a communications module.
- methods consistent with the present disclosure may include at least one or more of the steps in the method 1300 performed in a different order, simultaneously, quasi-simultaneously, or overlapping in time.
- Step 1302 includes retrieving multiple stage images including multiple views (e.g., 214 of FIG. 2) of a subject (e.g., 202 of FIG. 2).
- Step 1304 includes generating a 3D mesh (e.g., 216 of FIG. 2) of a head of the subject based on the stage images.
- a 3D mesh e.g., 216 of FIG. 2
- Step 1306 includes retrieving multiple self-images (e.g., 210 of FIG. 2) of the subject collected with a mobile device (e.g., used by the subject 202 of FIG. 2) while the subject moves relative to a selected light source.
- a mobile device e.g., used by the subject 202 of FIG. 2
- Step 1308 includes generating a view-dependent and illumination-dependent texture map (e.g., 520 of FIG. 5) for the subject, based on the stage images and the self-images.
- Step 1310 includes generating, based on the 3D mesh and the view-dependent and illumination-dependent texture map. a view of the subject, illuminated by a synthetic light source from an environment in an IR application.
- Step 1312 includes providing the view of the subject to the IR application, running in a headset.
- FIG. 14 is a flow diagram illustrating an example method 1400 for training a relightable avatar model for use in IR applications, according to some aspects of the subject technology.
- a processor circuit e.g., 118 of FIG. 1 reading instructions from a memory circuit (e.g., 120 of FIG. 1).
- the processor circuit and the memory circuit may be in a VR headset (e.g., 110 of FIG. 1), a remote server (e.g., 150 of FIG. 1), a mobile phone (e.g.. 130 of FIG. 1) and/or a database (e.g., 160 of FIG. 1), as disclosed herein.
- the VR headset, remote server, mobile phone and database may be communicatively coupled via a network (e.g., 140 of FIG. 1), by a communications module.
- methods consistent with the present disclosure may include at least one or more of the steps in method 1400 performed in a different order, simultaneously, quasi-simultaneously, or overlapping in time.
- Step 1402 includes retrieving multiple images of a subject from multiple view directions and under multiple illumination configurations (see FIGs. 9 and 11C).
- Step 1404 includes forming, using a model (e.g.. a neural network model), multiple synthetic views of the subject for each view direction and each illumination configuration.
- a model e.g.. a neural network model
- Step 1406 includes training the model with the images of the subject and the synthetic views of the subject (e.g., images 900 of FIG. 9).
- the subject technology is directed to a system including a mobile device that is operable to generate a mobile capture of a subject and multiple cameras to provide a multi-view scan of the subject under a uniform illumination.
- the system further includes a pipeline to perform several processes using the mobile capture and the multi-view scan to generate a relightable avatar.
- the mobile capture includes a video captured while the subject is moved relative to a light source.
- the multiple cameras are fixed around the subject, and the uniform illumination is provided by several light sources.
- the multiple cameras are configured to simultaneously take images of the multi-view scan.
- the images of the multi-view scan include a coarse geometry of a face including at least eyes, a nose and a mouth of the subject, and hairs of the subject.
- the pipeline includes a first processing stage configured to determine at least a reflectance, a pose and lighting parameters based on the mobile capture and the multi-view scan.
- the pipeline further includes a second processing stage configured to generate a relightable model of a head of the subject based on the reflectance, the pose and the lighting parameters.
- the pipeline further includes a differentiable renderer configured to combine the reflectance, the pose and the lighting parameters with images of the multi-view scan to provide a rendered image.
- the pose includes a camera pose and a head pose.
- the camera pose includes a first distance between the mobile device and a fixed point
- the head pose includes a second distance between the mobile device and the fixed point
- the light source includes a point light source.
- Another aspect of the disclosure is related to a method including retrieving multiple stage images including several views of a subject and retrieving multiple self-images of the subject by using a mobile device while the subject is being moved with respect to a point light source.
- the method further includes generating a 3D mesh of a head of the subject based on the stage images.
- the method further includes generating a texture map for the subject based on the stage images and the self-images.
- the texture map comprises a view-dependent and illumination-dependent texture map.
- the method further includes generating, based on the texture map and the 3D mesh, a view of the subject illuminated by a synthetic light source.
- the synthetic light source is associated with an environment in an immerse reality (IR) application.
- IR immerse reality
- the method further includes providing the view of the subject to the IR application running on a headset.
- Y et another aspect of the disclosure is related to a method including retrieving multiple images of a subject from several view directions and forming multiple synthetic views of the subject for each view direction. The method further includes training a model with the multiple images of the subject and the multiple synthetic views of the subject.
- retrieving the multiple images of the subject is under several illumination configurations.
- forming the plurality of synthetic views of the subject are further for each illumination configuration of the several illumination configurations.
- the method further includes using a mobile device to capture at least some of the plurality of images of the subject from the plurality of view directions using a single point light source.
- the method further includes using several cameras and a few light sources to provide a uniform illumination to capture at least some of the multiple images of the subject.
- a disclosure relating to such phrase(s) may apply to all configurations, or one or more configurations.
- a disclosure relating to such phrase(s) may provide one or more examples.
- a phrase such as an aspect or some aspects may refer to one or more aspects and vice versa, and this applies similarly to other foregoing phrases.
- aspects of the subject matter described in this disclosure can be implemented to realize one or more of the following potential advantages.
- the described techniques may be implemented to support a range of benefits and significant advantages of the disclosed eye tracking system. It should be noted that the subject technology enables fabrication of a depth-sensing apparatus that is a fully solid-state device with small size, low power, and low' cost.
- the phrase “at least one of preceding a series of items, with the terms "and” or “of to separate any of the items, modifies the list as a whole, rather than each member of the list (i.e., each item).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
Abstract
Un système de la technologie de l'invention comprend un dispositif mobile utilisable pour générer une capture mobile d'un sujet et de multiples caméras pour fournir un balayage multivues du sujet sous un éclairage uniforme. Le système comprend en outre un pipeline pour effectuer plusieurs processus à l'aide de la capture mobile et du balayage multivues pour générer un avatar à éclairage pouvant être changé. La capture mobile comprend une vidéo capturée pendant que le sujet est déplacé par rapport à une source de lumière.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363457961P | 2023-04-07 | 2023-04-07 | |
| US63/457,961 | 2023-04-07 | ||
| US18/628,476 | 2024-04-05 | ||
| US18/628,476 US20240338893A1 (en) | 2023-04-07 | 2024-04-05 | Face relighting of avatars with high-quality scan and mobile capture |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024211884A1 true WO2024211884A1 (fr) | 2024-10-10 |
Family
ID=91030182
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/023569 Pending WO2024211884A1 (fr) | 2023-04-07 | 2024-04-08 | Changement d'éclairage de visage d'avatars avec balayage de haute qualité et capture mobile |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024211884A1 (fr) |
-
2024
- 2024-04-08 WO PCT/US2024/023569 patent/WO2024211884A1/fr active Pending
Non-Patent Citations (3)
| Title |
|---|
| "European Conference on Computer Vision", vol. 13674, 23 October 2022, SPRINGER BERLIN HEIDELBERG, Copenhagen, Denmark, ISSN: 0302-9743, article CHEN ZHAOXI ET AL: "Relighting4D: Neural Relightable Human from Videos", pages: 606 - 623, XP093179968, DOI: 10.1007/978-3-031-19781-9_35 * |
| CHEN LELE ET AL: "High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation", 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 20 June 2021 (2021-06-20), pages 13054 - 13064, XP034009345, DOI: 10.1109/CVPR46437.2021.01286 * |
| SUN TIANCHENG ET AL: "NeLF: Neural Light-transport Field for Portrait View Synthesis and Relighting", ARXIV (CORNELL UNIVERSITY), 26 July 2021 (2021-07-26), XP093179966, Retrieved from the Internet <URL:https://arxiv.org/pdf/2107.12351> DOI: 10.48550/arxiv.2107.12351 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Pandey et al. | Total relighting: learning to relight portraits for background replacement. | |
| CN106133796B (zh) | 用于在真实环境的视图中表示虚拟对象的方法和系统 | |
| Somanath et al. | HDR environment map estimation for real-time augmented reality | |
| US11688145B2 (en) | Virtualizing content | |
| CN107771336B (zh) | 基于颜色分布的图像中的特征检测和掩模 | |
| US9167155B2 (en) | Method and system of spacial visualisation of objects and a platform control system included in the system, in particular for a virtual fitting room | |
| US12307733B2 (en) | Learning illumination from diverse portraits | |
| Sarkar et al. | Litnerf: Intrinsic radiance decomposition for high-quality view synthesis and relighting of faces | |
| Li et al. | Capturing relightable human performances under general uncontrolled illumination | |
| US20240412448A1 (en) | Object rendering | |
| JP2023553259A (ja) | ダークフラッシュノーマルカメラ | |
| Azinović et al. | High-res facial appearance capture from polarized smartphone images | |
| EP3629303A1 (fr) | Procédé et système de représentation d'un objet virtuel dans une vue d'un environnement réel | |
| CN114219976A (zh) | 图像处理方法、装置、电子设备、存储介质及计算机产品 | |
| Wang et al. | Digital twin: Acquiring high-fidelity 3D avatar from a single image | |
| CN118786460A (zh) | 用于图像照明控制的轻量级机器学习的系统和方法 | |
| Song et al. | Real-time shadow-aware portrait relighting in virtual backgrounds for realistic telepresence | |
| US20240338893A1 (en) | Face relighting of avatars with high-quality scan and mobile capture | |
| WO2024211884A1 (fr) | Changement d'éclairage de visage d'avatars avec balayage de haute qualité et capture mobile | |
| RU2757563C1 (ru) | Способ визуализации 3d портрета человека с измененным освещением и вычислительное устройство для него | |
| Chang et al. | Subjective assessment for inverse rendered composite images in 360-deg images | |
| CN118447151A (zh) | 重光照方法和装置 | |
| Csakany et al. | Relighting of Facial Images | |
| CN117011439A (zh) | 图像重建方法、装置及计算机设备、存储介质、产品 | |
| Spencer | Real-time monocular vision-based tracking for interactive augmented reality |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24724707 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |