WO2022198686A1

WO2022198686A1 - Accelerated neural radiance fields for view synthesis

Info

Publication number: WO2022198686A1
Application number: PCT/CN2021/083446
Authority: WO
Inventors: Huangjie Yu; Chaolin RAO; Minye WU; Xin LOU; Pingqiang ZHOU; Jingyi Yu
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-09-29
Anticipated expiration: 2023-09-26
Also published as: CN117083638A

Abstract

Described herein are systems, methods, and non-transitory computer-readable media configured to encode a radiance field of an object onto a machine learning model. A transformation matrix and a projection matrix with which to render an image of the object can be generated. Radiance values and volume density values associated with pixels of the image can be obtained through the machine learning model. The radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels. Color associated with the pixels can be rendered based on a volume rendering technique. The volume rendering technique can include an application of a depth offset to a depth map associated with the object.

Description

ACCELERATED NEURAL RADIANCE FIELDS FOR VIEW SYNTHESIS

BACKGROUND

Neural radiance field, or NeRF, is a framework that allows rendering of objects by utilizing radiance fields of the objects. A radiance field of an object can be generally thought of as a representation or visualization of the object in a three-dimensional rendering space through which various renderings, such as images or videos, of the object can be generated. In this way, novel views or video animations of the object can be rendered (e.g., synthesized, constructed, etc. ) based on the radiance field. In general, a NeRF framework with which to generate various renderings of an object can be computationally intensive. For example, after a machine learning model has been trained to encode a radiance field of an object (e.g., a three-dimensional rendering space of the object) onto the machine learning model, computations that are needed to render various views of the object through the radiance field can be taxing and can require high computational effort. As such, under current NeRF frameworks, real-time rendering of objects through radiance fields can be challenging, ineffective, and in some cases, nearly impractical.

SUMMARY

Described herein, in various embodiments, are systems, methods, and non-transitory computer-readable media configured to encode a radiance field of an object onto a machine learning model. A transformation matrix and a projection matrix with which to render an image of the object can be generated. Radiance values and volume density values associated with pixels of the image can be obtained through the machine learning model. The radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels. Color associated with the pixels can be rendered based on a volume rendering technique. The volume rendering technique can include an application of a depth offset to a depth map associated with the object.

In some embodiments, the radiance field can include a three-dimensional rendering space depicting the object.

In some embodiments, the machine learning model can be a fully connected neural network and trained based on a set of images depicting the object from various viewpoints. The set of images can be converted into a continuous five-dimensional representation

In some embodiments, the radiance field can be discretized into a plurality of voxels. The depth map can be generated based on surface information of the object in the radiance field. The surface information can be represented in triangular mesh.

In some embodiments, the triangular mesh can be generated based on a marching cube technique. The marching cube technique can determine the surface information of the object based on volume density values associated with the plurality of voxels.

In some embodiments, the application of the depth offset to the depth map associated with the object can reduce a number of voxels needed for the volume rendering technique.

In some embodiments, the depth offset can be a pre-defined constant value and can indicate a range of voxels with which to render color of a pixel for the volume rendering technique.

In some embodiments, the application of the depth offset to the depth map associated with the object can reduce a number of iterations needed for the volume rendering technique.

In some embodiments, the transformation matrix and the projection matrix can transform vertices of the object in the radiance field to a two-dimensional image space.

In some embodiments, the machine learning model can output the radiance values and the volume density values based on the position vectors and the direction vectors.

These and other features of the apparatuses, systems, methods, and non-transitory computer-readable media disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for purposes of illustration and description only and are not intended as a definition of the limits of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Certain features of various embodiments of the present technology are set forth with particularity in the appended claims. A better understanding of the features and advantages of the technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

FIGURE 1 illustrates an example system, including an object view synthesis module, according to various embodiments of the present disclosure.

FIGURE 2 illustrates an example volume rendering module, according to various embodiments of the present disclosure.

FIGURE 3 illustrates an example radiance field, according to various embodiments of the present disclosure.

FIGURE 4 illustrates a computing component that includes one or more hardware processors and a machine-readable storage media storing a set of machine-readable/machine-executable instructions that, when executed, cause the hardware processor (s) to perform a method, according to various embodiments of the present disclosure.

FIGURE 5 is a block diagram that illustrates a computer system upon which any of various embodiments described herein may be implemented.

The figures depict various embodiments of the disclosed technology for purposes of illustration only, wherein the figures use like reference numerals to identify like elements. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated in the figures can be employed without departing from the principles of the disclosed technology described herein.

DETAILED DESCRIPTION

Described herein is a solution that addresses the problems described above. In various embodiments, an object view synthesis module can be configured to generate real-time renderings of an object based on a NeRF framework that reduces computational efforts needed to render the object in or near real-time. The object view synthesis module can train a machine learning model, such as a fully connected neural network, to encode a radiance field of the object onto the machine learning model. The machine learning model can be trained using a set of images depicting the object from various viewpoints. The radiance field can be a representation or visualization of the object in a three-dimensional rendering space through which various views of the object can be rendered (e.g., synthesized, constructed, etc. ) . Once the radiance field is encoded onto the machine learning model, the object view synthesis module can query the machine learning model to output radiance values and volume density values at various points associated with the radiance field from various vantage points. In general, a vantage point can be a point in the radiance field (i.e., the three-dimensional rendering space) from which an imaginary light ray can be injected into the radiance field in a particular direction through the point. As such, a vantage point can have a position vector indicating a location in the radiance field and a direction vector indicating a direction. For example, the object view synthesis module can query the machine learning model output radiance values and volume density values based on a vantage point comprising a position vector concatenated with a direction vector. In this example, the machine learning model can output the radiance values and the volume density values along points in the radiance field through which an imaginary light ray corresponding to the position vector and the direction vector has traveled through the vantage point. In general, a vantage point can correspond to a pixel of an image to be rendered by the object view synthesis module. Therefore, if the object view synthesis module is instructed to render an image comprising 900 pixels (i.e., 30 pixels by 30 pixels) , each pixel in the image can be a vantage point with which the object view synthesis module can query the machine learning model to output corresponding radiance values and volume density values. From the radiance values and the volume density values, color of each pixel can be rendered. Once radiance values and volume density values of pixels of an image to be rendered are obtained, the object view synthesis module can render the image. To reduce computational effort in rendering the image, the object view synthesis module can discretize the radiance field into a plurality of voxels (e.g., three-dimensional graphical units) . Volume density values associated with the plurality of voxels can be evaluated using a marching cube technique to extract surface information of the object in the radiance field as triangular mesh. Based on the triangular mesh, the object view synthesis module can generate a depth map of the object. Based on the depth map, a pre-defined offset can be biased or added to the depth map to reduce the computational effort in rendering the image, thereby enabling the object view synthesis module to render the objects under the NeRF framework in real-time. These and other features of the object view synthesis module are discussed herein.

FIGURE 1 illustrates an example system 100, including an object view synthesis module 110, according to various embodiments of the present disclosure. The object view synthesis module 110 can be configured to generate real-time renderings of an object based on a NeRF framework. In some embodiments, the object view synthesis module 110 can be implemented, in part or in whole, as software, hardware, or any combination thereof. In some embodiments, the object view synthesis module 110 can be implemented, in part or in whole, as software running on one or more computing devices or systems, such as a cloud computing system. For example, the object view synthesis module 110 can be implemented, in part or in whole, on a cloud computing system to generate images of an object under a NeRF framework from various selected perspectives or viewpoints. Many variations are possible. In some embodiments, the object view synthesis module 110 can comprise a training data preparation module 112, a radiance field encoding module 114, and a volume rendering module 116. Each of these modules is discussed below.

In some embodiments, as shown in FIGURE 1, the system 100 can further include at least one data store 120. The object view synthesis module 110 can be configured to communicate and/or operate with the at least one data store 120. The at least one data store 120 can store various types of data associated with the object view synthesis module 110. For example, the at least one data store 120 can store training data with which to train a machine learning model to encode a radiance field of an object onto the machine learning model. The training data can include, for example, images depicting the object from various viewpoints. For instance, the at least one data store 120 can store a plurality of images depicting a dog to train a machine learning model to encode a radiance field of the dog onto the machine learning model. In some embodiments, the at least one data store 120 can store data relating to radiance fields such as radiance values and volume density values accessible to the object view synthesis module 110. In some embodiments, the at least one data store 120 can store various data relating to triangular mesh and depth maps accessible to the object view synthesis module 110. In some embodiments, the at least one data store 120 can store machine-readable instructions (e.g., codes) that, when executed, cause one or more computing systems to perform training of a machine learning model or render images based on radiance fields. Many variations are possible.

In some embodiments, the training data preparation module 112 can be configured to generate training data with which to train a machine learning model to encode a radiance field of an object on the machine learning model. In general, a radiance field of an object can be a representation or visualization of the object in a three-dimensional rendering space through which various views of the object can be rendered (e.g., synthesized, constructed, etc. ) . The training data to encode the radiance field of the object can comprise a set of images depicting the object at various viewpoints. For example, a first image in the set of images can depict the object in a frontal view, a second image in the set of images can depict the object in a side view, in the set of images a third image can depict the object in a top view, etc. To reduce complexity and time to train the machine learning model, the training data preparation module 112 can covert the set of images into a continuous five-dimensional representation. In the continuous five- dimensional representation, each pixel in each image of the set of images can be represented by a position vector and a direction vector. The position vector can be represented in Euclidean coordinates (x, y, z) and the direction vector can be represented in spherical coordinates (θ, φ) . As such, each pixel in each image of the set of images can be represented by parameters of x, y, z, θ, and φ, or in five dimensions, and the set of images can be represented by a single continuous string of parameters of x, y, z, θ, and φ. By representing training data in such a manner, dimensionality of training the machine learning model to encode the radiance field of the object can be greatly reduced, thereby reducing time needed to train the machine learning model. In some embodiments, position vectors and direction vectors of pixels in an image can be determined based on a pose associated with the image. A pose of an image is an estimation of a position and an orientation (or direction) of an object depicted in the image from a center of a camera from which the image was captured. In one implementation, a pose of an image can be estimated based on a structure from motion (SfM) technique. In another implementation, a pose of an image can be estimated based on a simultaneous localization and mapping (SLAM) technique. Many variations are possible.

In some embodiments, the radiance field encoding module 114 can be configured to encode a radiance field of an object onto a machine learning based on training data provided by the training data preparation module 112. Once the radiance field of the object is encoded onto the machine learning model, the machine learning model can be queried to output radiance values and volume density values associated with points in the radiance field from various vantage points. In general, a vantage point can be a point in the radiance field (i.e., the three-dimensional rendering space) from which an imaginary light ray can be injected into the radiance field in a particular direction through the point. As such, a vantage point can have a position vector indicating a location in the radiance field and a direction vector indicating a direction. As an illustrative example, in some embodiments, the machine learning model can be queried based on a vantage point comprising a position vector concatenated with a direction vector. An imaginary light ray can be generated to travel through the radiance field at a point indicated by the position vector and at a direction indicated by the direction vector. In this example, the machine learning model can output radiance values and volume density values along points in the radiance field through which the imaginary light ray has traveled. In some embodiments, the machine learning model can be implemented using a multilayer perceptron (MLP) . In other embodiments, the machine learning model can be implemented using other machine learning models. Many variations are possible. For example, in one implementation, the machine learning model can be implemented using a neural network comprising nine fully-connected perceptron layers. This neural network can be trained to encode a radiance field of an object. In this implementation, the neural network can take a position vector corresponding to a point, as input, and output a volume density value and a feature vector for the point at the eighth layer of the neural network. The feature vector can then be concatenated with a direction vector corresponding to the point and pass it to the last layer of the neural network to output a radiance value for the point.

In some embodiments, a machine learning model through which the radiance field encoding module 114 is configured can be expressed as follows:

f (m, s) = [ρ, r]

where m is a position vector at a point in a radiance field, s is a direction vector at the point in the radiance field, ρ is volume density values along a direction of the direction vector in the radiance field, and r is radiance values along the direction of the direction vector in the radiance field. In this regard, the machine learning model can be expressed as a function, f, that takes the position vector and the direction vector as inputs and outputs the radiance values and the volume density values along the direction of the direction vector in the radiance field. During training of the machine learning model, parameters associated with the machine learning model (e.g., weights of the neural network) can be optimized, through back-propagation, such that f converges to a reference radiance field (e.g., the ground truth radiance field) . Once f converges to the reference radiance field within some thresholds, training for the machine learning model is deemed complete and the parameters for the machine learning model become fixed. The trained machine learning model can output radiance values (e.g., r) and volume density values (e.g., ρ) along any direction corresponding to any point (e.g., m, s) in the radiance field.

In some embodiments, to prevent over-smoothing of the radiance field, a positional encoding technique can be applied during training of the machine learning model. The positional encoding technique can transform the position vector, m, and the direction vector, s, from a low-dimensional space (e.g., five dimensions) to a higher-dimensional space to increase fidelity of the radiance field. In some embodiments, the positional encoding technique can be based on a sinusoidal expression as shown:

γ (x) = [sin (2 ⁰x) , cos (2 ⁰x) , sin (2 ¹x) , cos (2 ¹x) , ..., sin (2 ^Lx) , cos (2 ^Lx) ]

where L is a hyper-parameter. In one implementation, L is set to 9 for γ (m) and 4 for γ (s) . In this implementation, the positional encoding technique can allow the machine learning model to take 76-dimension vectors as inputs instead of 5-dimension vectors (e.g., x, y, z, θ, φ) to output radiance values and volume density values. In this way, the machine learning model can be biased toward encoding the radiance field in higher fidelity.

In some embodiments, the volume rendering module 116 can be configured to generate renderings (e.g., images, videos, etc. ) of an object based on a radiance field of the object. The volume rendering module 116 can generate a depth map of the object based on volume density values associated with the radiance field. The volume rendering module 116 can generate a rendering of the object for each given transformation matrix and projection matrix. To generate the rendering of the object in or near real-time, the volume rendering module 116 can be biased with a predefined offset value added to the depth map of the object. This bias reduces a number of summation iterations needed to render color of each pixel in the rendering. The volume rendering module 116 will be discussed in further detail with reference to FIGURE 2 herein.

FIGURE 2 illustrates an example volume rendering module 200, according to various embodiments of the present disclosure. In some embodiments, the volume rendering module 116 of FIGURE 1 can be implemented as the volume rendering module 200. As shown in FIGURE 2, in some embodiments, the volume rendering module 200 can include a depth map generation module 202, a perspective generation module 204, and an image rendering module 206. Each of these modules will be discussed below.

In some embodiments, the depth map generation module 202 can be configured to generate a depth map of an object based on a radiance field of the object. Initially, the depth map generation module 202 can evaluate volume density values of points associated with the radiance field of the object. The volume density values can represent opacities of the points. The depth map generation module 202 can discretize the radiance field into a plurality of voxels. A voxel is a unit of graphic information in a three-dimensional space, similar to a pixel in a two- dimensional image. The depth map generation module 202 can obtain a volume density value associated with each voxel by querying a machine learning model that encoded the radiance field to output the volume density value for the voxel. Based on volume density values of the plurality of voxels, the depth map generation module 202 can, using a marching cube technique, generate surfaces (e.g., isosurfaces) for the object and the surfaces can be represented as triangular meshes. In general, voxels corresponding surfaces of an object have approximately equal volume density values, voxels corresponding to regions near the surfaces of the object have high volume density values, and voxels corresponding to regions away from the surfaces of the object have low volume density values. Based on these principles, the triangular meshes for the object can be generated. Correspondingly, the depth map of the object can be generated based on the triangular meshes using conventional techniques.

In some embodiments, the perspective generation module 204 can be configured to generate a transformation matrix and a projection matrix with which to render an image of an object through a radiance field of the object. The transformation matrix and the projection matrix can be generated based on a point (e.g., a perspective) associated with the radiance field looking at the object. For example, a radiance field can depict an artifact. In this example, a point in the radiance field can be selected such that the artifact is positioned and oriented in the radiance field from the perspective of the point (e.g., “framing” the artifact) . In this regard, the transformation matrix and the projection matrix are transformations that, together, can transform vertices of the object from the radiance field to a two-dimensional image space. Through the image space, the image of the object can be rendered.

In some embodiments, the image rendering module 206 can be configured to render an image of an object for a given transformation matrix and a given projection matrix. In general, the image rendering module 204 can utilize volume rendering techniques to render an image. In such techniques, an image can be rendered (constructed) by compositing pixels in an image space as indicated by the transformation and projection matrices. The image rendering module 206 can render color of each pixel based on an absorption and emission particle model in which color of a pixel is determined based on a light ray injected into a radiance field of the object. In some embodiments, the image rendering module 204 can render color of a pixel based on a numerical quadrature expression shown:

where

is a color value of a pixel, ρ _i is a volume density value of an i-th voxel obtained through a machine learning model that encoded the radiance field, r _i is a radiance value of the i-th voxel obtained through the machine learning model that encoded the radiance field, and δ _i is a Euclidean distance between the i-th and an (i+1) -th voxel, and N is a total number of voxels along a light ray l. In volume rendering techniques, N can be 196. As such, as shown in the expression, rendering a color value of a pixel of an image take up to 196 summation iterations (i.e., N = 196) and take into account radiance values and volume density values of 196 voxels. As such, these volume rendering techniques can be computationally impractical to render an image in real-time.

In some embodiments, to reduce computational effort of rendering an image of an object, a depth offset can be added to a depth map of the object as generated by the depth map generation module 202. In general, the depth map can provide information relating to distributions of volume density values of voxels in a radiance field of the object. Because voxels corresponding to surfaces have relatively high volume density values and while voxels corresponding to non-surface have low volume density values, only voxels that have high volume density values are used to compute color values of pixels during image rendering, thereby greatly reducing number of summation iterations (e.g., N) needed to compute pixel color. The depth offset can be added to the depth map to determine a range of voxels with which to render color of a pixel. For example, a light ray corresponding to a pixel can be injected into a radiance field of an object. This light ray can travel through 100 voxels in the radiance field. In this example, the 60 ^th voxel is a voxel with a high volume density value. In this example, a depth offset can be added to a depth map of the object and this depth offset can correspond to 5 voxels (e.g., a range of voxels) in the radiance field. Therefore, to compute a color value for the pixel, radiance values and volume density values of the 60 ^th voxel, the 61 ^st voxel, the 62 ^nd voxel, the 63 ^rd voxel, the 64 ^th voxel, and the 65 ^th voxel are summed using the numerical quadrature expression above. In some embodiments, the depth offset can be a pre-defined constant value. In other embodiments, the depth offset can be user adjusted. In some embodiments, adding a depth offset can reduce a number of voxels needed to compute a color value of a pixel from 196 to 8 voxels. As such, in such embodiments, the image rendering module 206 can render a color value of a pixel in 8 summation iteration, thereby allowing real-time rendering of an image under a NeRF framework possible.

FIGURE 3 illustrates an example radiance field 300, according to various embodiments of the present disclosure. As discussed above, the radiance field 300 can be encoded onto a machine learning model by training the machine learning model with a set of images. As shown in FIGURE 3, the radiance field 300 can include an object 302 in a three-dimensional rendering space 304. An image 306 can be generated based on the radiance field 300 through a volume rendering technique. In the volume rendering technique, a pixel 308 of the image 306 can be associated with a position vector and a direction vector. The position vector can indicate a point in the three-dimensional rendering space 304 at which to generate an imaginary light ray 310. A direction of the imaginary light ray 310 can be indicated by the direction vector. The direction vector 310 can be injected into the three-dimensional rendering space 304. Points 312 associated with the radiance field 300 the imaginary light ray 310 traveled through can indicate radiance values and volume density values. By introducing an offset to a depth map associated with the object 302 (not shown) , the points 312 needed to render color of the pixel 308 can be reduced. Because a number of points needed to iterate is reduced, time needed to render the color of the pixel 308 can also be reduced, thereby allowing the image 306 to be rendered in or near real-time. In FIGURE 3, for simplicity, the image 306 is rendered as a frontal view of the object 302. Other views of the object 302 can also be rendered through the volume rendering technique described herein. For example, an isometric view of the object 302 can also be rendered through the volume rendering technique.

FIGURE 4 illustrates a computing component 400 that includes one or more hardware processors 402 and a machine-readable storage media 404 storing a set of machine-readable/machine-executable instructions that, when executed, cause the hardware processor (s) 402 to perform a method, according to various embodiments of the present disclosure. The computing component 400 may be, for example, the computing system 500 of FIGURE 5. The hardware processors 402 may include, for example, the processor (s) 504 of FIGURE 5 or any other processing unit described herein. The machine-readable storage media 404 may include the main memory 506, the read-only memory (ROM) 508, the storage 510 of FIGURE 5, and/or any other suitable machine-readable storage media described herein.

At block 406, the processor 402 can encode a radiance field of an object onto a machine learning model. The radiance field can include a three-dimensional rendering space depicting the object. The machine learning model can be a fully connected neural network and trained based on a set of images depicting the object from various viewpoints. The set of images can be converted into a continuous five-dimensional representation.

At block 408, the processor 402 can generate a transformation matrix and a projection matrix with which to render an image of the object. The transformation matrix and the projection matrix can transform vertices of the object in the radiance field to a two-dimensional image space.

At block 410, the processor 402 can obtain, through the machine learning model, radiance values and volume density values associated with pixels of the image. The radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels. The machine learning model can output the radiance values and the volume density values based on the position vectors and the direction vectors.

At block 412, the processor 402 can render color associated with the pixels based on a volume rendering technique. The volume rendering technique can include an application of a depth offset to a depth map associated with the object. The radiance field can be discretized into a plurality of voxels. the depth map can be generated based on surface information of the object in the radiance field. The surface information can be represented in triangular mesh. The triangular mesh can be generated based on a marching cube technique. The marching cube technique can determine the surface information of the object based on volume density values associated with the plurality of voxels. The depth offset can be a pre-defined constant value.

The techniques described herein, for example, are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include circuitry or digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.

FIGURE 5 is a block diagram that illustrates a computer system 500 upon which any of various embodiments described herein may be implemented. The computer system 500 includes a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with bus 502 for processing information. A description that a device performs a task is intended to mean that one or more of the hardware processor (s) 504 performs.

The computer system 500 also includes a main memory 506, such as a random access memory (RAM) , cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.

The computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive) , etc., is provided and coupled to bus 502 for storing information and instructions.

The computer system 500 may be coupled via bus 502 to output device (s) 512, such as a cathode ray tube (CRT) or LCD display (or touch screen) , for displaying information to a computer user. Input device (s) 514, including alphanumeric and other keys, are coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516. The computer system 500 also includes a communication interface 518 coupled to bus 502.

Unless the context requires otherwise, throughout the present specification and claims, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense, that is as “including, but not limited to. ” Recitation of numeric ranges of values throughout the specification is intended to serve as a shorthand notation of referring individually to each separate value falling within the range inclusive of the values defining the range, and each separate value is incorporated in the specification as it were individually recited herein. Additionally, the singular forms “a, ” “an” and “the” include plural referents unless the context clearly dictates otherwise. The phrases “at least one of, ” “at least one selected from the group of, ” or “at least one selected from the group consisting of, ” and the like are to be interpreted in the disjunctive (e.g., not to be interpreted as at least one of A and at least one of B) .

Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may be in some instances. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiment.

A component being implemented as another component may be construed as the component being operated in a same or similar manner as the another component, and/or comprising same or similar features, characteristics, and parameters as the another component.

Claims

A computer-implemented method comprising:

encoding, by a computing system, a radiance field of an object onto a machine learning model;

generating, by the computing system, a transformation matrix and a projection matrix with which to render an image of the object;

obtaining, by the computing system, through the machine learning model, radiance values and volume density values associated with pixels of the image, wherein the radiance values and the volume density values are obtained based on position vectors and direction vectors associated with the pixels; and

rendering, by the computing system, color associated with the pixels based on a volume rendering technique, wherein the volume rendering technique includes an application of a depth offset to a depth map associated with the object.
The computer-implemented method of claim 1, wherein the radiance field includes a three-dimensional rendering space depicting the object.
The computer-implemented method of claim 1, wherein the machine learning model is a fully connected neural network and trained based on a set of images depicting the object from various viewpoints, wherein the set of images is converted into a continuous five-dimensional representation.
The computer-implemented method of claim 1, wherein the radiance field is discretized into a plurality of voxels, and wherein the depth map is generated based on surface information of the object in the radiance field, and wherein the surface information is represented in triangular mesh.
The computer-implemented method of claim 4, wherein the triangular mesh is generated based on a marching cube technique, and wherein the marching cube technique determines the surface information of the object based on volume density values associated with the plurality of voxels.
The computer-implemented method of claim 4, wherein the application of the depth offset to the depth map associated with the object reduces a number of voxels needed for the volume rendering technique.
The computer-implemented method of claim 1, wherein the depth offset is a pre-defined constant value and indicates a range of voxels with which to render color of a pixel for the volume rendering technique.
The computer-implemented method of claim 1, wherein the application of the depth offset to the depth map associated with the object reduces a number of iterations needed for the volume rendering technique.
The computer-implemented method of claim 1, wherein the transformation matrix and the projection matrix transform vertices of the object in the radiance field to a two-dimensional image space.
The computer-implemented method of claim 1, wherein the machine learning model outputs the radiance values and the volume density values based on the position vectors and the direction vectors.
A system comprising:

one or more processors; and

a memory storing instructions that, when executed by the one or more processors, cause the system to perform a method comprising:

encoding a radiance field of an object onto a machine learning model;

generating a transformation matrix and a projection matrix with which to render an image of the object;

obtaining, through the machine learning model, radiance values and volume density values associated with pixels of the image, wherein the radiance values and the volume density values are obtained based on position vectors and direction vectors associated with the pixels; and

rendering color associated with the pixels based on a volume rendering technique, wherein the volume rendering technique includes an application of a depth offset to a depth map associated with the object.
The system of claim 11, wherein the radiance field includes a three-dimensional rendering space depicting the object.
The system of claim 12, wherein the machine learning model is a fully connected neural network and trained based on a set of images depicting the object from various viewpoints, wherein the set of images is converted into a continuous five-dimensional representation.
The system of claim 13, wherein the radiance field is discretized into a plurality of voxels, and wherein the depth map is generated based on surface information of the object in the radiance field, and wherein the surface information is represented in triangular mesh.
The system of claim 11, wherein the triangular mesh is generated based on a marching cube technique, and wherein the marching cube technique determines the surface information of the object based on volume density values associated with the plurality of voxels.
A non-transitory storage medium of a computing system storing instructions that, when executed by one or more processors of the computing system, cause the computing system to perform a method comprising:

encoding a radiance field of an object onto a machine learning model;

generating a transformation matrix and a projection matrix with which to render an image of the object;

obtaining, through the machine learning model, radiance values and volume density values associated with pixels of the image, wherein the radiance values and the volume density values are obtained based on position vectors and direction vectors associated with the pixels; and

rendering color associated with the pixels based on a volume rendering technique, wherein the volume rendering technique includes an application of a depth offset to a depth map associated with the object.
The non-transitory storage medium of claim 16, wherein the radiance field includes a three-dimensional rendering space depicting the object.
The non-transitory storage medium of claim 17, wherein the machine learning model is a fully connected neural network and trained based on a set of images depicting the object from various viewpoints, wherein the set of images is converted into a continuous five-dimensional representation.
The non-transitory storage medium of claim 18, wherein the radiance field is discretized into a plurality of voxels, and wherein the depth map is generated based on surface information of the object in the radiance field, and wherein the surface information is represented in triangular mesh.
The non-transitory storage medium of claim 16, wherein the triangular mesh is generated based on a marching cube technique, and wherein the marching cube technique determines the surface information of the object based on volume density values associated with the plurality of voxels.