WO2022198686A1 - Accelerated neural radiance fields for view synthesis - Google Patents
Accelerated neural radiance fields for view synthesis Download PDFInfo
- Publication number
- WO2022198686A1 WO2022198686A1 PCT/CN2021/083446 CN2021083446W WO2022198686A1 WO 2022198686 A1 WO2022198686 A1 WO 2022198686A1 CN 2021083446 W CN2021083446 W CN 2021083446W WO 2022198686 A1 WO2022198686 A1 WO 2022198686A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- radiance
- machine learning
- learning model
- image
- density values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/506—Illumination models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/08—Volume rendering
Definitions
- Neural radiance field is a framework that allows rendering of objects by utilizing radiance fields of the objects.
- a radiance field of an object can be generally thought of as a representation or visualization of the object in a three-dimensional rendering space through which various renderings, such as images or videos, of the object can be generated. In this way, novel views or video animations of the object can be rendered (e.g., synthesized, constructed, etc. ) based on the radiance field.
- a NeRF framework with which to generate various renderings of an object can be computationally intensive.
- a machine learning model has been trained to encode a radiance field of an object (e.g., a three-dimensional rendering space of the object) onto the machine learning model
- computations that are needed to render various views of the object through the radiance field can be taxing and can require high computational effort.
- real-time rendering of objects through radiance fields can be challenging, ineffective, and in some cases, nearly impractical.
- a transformation matrix and a projection matrix with which to render an image of the object can be generated.
- Radiance values and volume density values associated with pixels of the image can be obtained through the machine learning model.
- the radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels.
- Color associated with the pixels can be rendered based on a volume rendering technique.
- the volume rendering technique can include an application of a depth offset to a depth map associated with the object.
- the radiance field can include a three-dimensional rendering space depicting the object.
- the machine learning model can be a fully connected neural network and trained based on a set of images depicting the object from various viewpoints.
- the set of images can be converted into a continuous five-dimensional representation
- the radiance field can be discretized into a plurality of voxels.
- the depth map can be generated based on surface information of the object in the radiance field.
- the surface information can be represented in triangular mesh.
- the triangular mesh can be generated based on a marching cube technique.
- the marching cube technique can determine the surface information of the object based on volume density values associated with the plurality of voxels.
- the application of the depth offset to the depth map associated with the object can reduce a number of voxels needed for the volume rendering technique.
- the depth offset can be a pre-defined constant value and can indicate a range of voxels with which to render color of a pixel for the volume rendering technique.
- the application of the depth offset to the depth map associated with the object can reduce a number of iterations needed for the volume rendering technique.
- the transformation matrix and the projection matrix can transform vertices of the object in the radiance field to a two-dimensional image space.
- the machine learning model can output the radiance values and the volume density values based on the position vectors and the direction vectors.
- FIGURE 1 illustrates an example system, including an object view synthesis module, according to various embodiments of the present disclosure.
- FIGURE 2 illustrates an example volume rendering module, according to various embodiments of the present disclosure.
- FIGURE 3 illustrates an example radiance field, according to various embodiments of the present disclosure.
- FIGURE 4 illustrates a computing component that includes one or more hardware processors and a machine-readable storage media storing a set of machine-readable/machine-executable instructions that, when executed, cause the hardware processor (s) to perform a method, according to various embodiments of the present disclosure.
- FIGURE 5 is a block diagram that illustrates a computer system upon which any of various embodiments described herein may be implemented.
- Neural radiance field is a framework that allows rendering of objects by utilizing radiance fields of the objects.
- a radiance field of an object can be generally thought of as a representation or visualization of the object in a three-dimensional rendering space through which various renderings, such as images or videos, of the object can be generated. In this way, novel views or video animations of the object can be rendered (e.g., synthesized, constructed, etc. ) based on the radiance field.
- a NeRF framework with which to generate various renderings of an object can be computationally intensive.
- a machine learning model has been trained to encode a radiance field of an object (e.g., a three-dimensional rendering space of the object) onto the machine learning model
- computations that are needed to render various views of the object through the radiance field can be taxing and can require high computational effort.
- real-time rendering of objects through radiance fields can be challenging, ineffective, and in some cases, nearly impractical.
- an object view synthesis module can be configured to generate real-time renderings of an object based on a NeRF framework that reduces computational efforts needed to render the object in or near real-time.
- the object view synthesis module can train a machine learning model, such as a fully connected neural network, to encode a radiance field of the object onto the machine learning model.
- the machine learning model can be trained using a set of images depicting the object from various viewpoints.
- the radiance field can be a representation or visualization of the object in a three-dimensional rendering space through which various views of the object can be rendered (e.g., synthesized, constructed, etc. ) .
- the object view synthesis module can query the machine learning model to output radiance values and volume density values at various points associated with the radiance field from various vantage points.
- a vantage point can be a point in the radiance field (i.e., the three-dimensional rendering space) from which an imaginary light ray can be injected into the radiance field in a particular direction through the point.
- a vantage point can have a position vector indicating a location in the radiance field and a direction vector indicating a direction.
- the object view synthesis module can query the machine learning model output radiance values and volume density values based on a vantage point comprising a position vector concatenated with a direction vector.
- the machine learning model can output the radiance values and the volume density values along points in the radiance field through which an imaginary light ray corresponding to the position vector and the direction vector has traveled through the vantage point.
- a vantage point can correspond to a pixel of an image to be rendered by the object view synthesis module.
- each pixel in the image can be a vantage point with which the object view synthesis module can query the machine learning model to output corresponding radiance values and volume density values. From the radiance values and the volume density values, color of each pixel can be rendered. Once radiance values and volume density values of pixels of an image to be rendered are obtained, the object view synthesis module can render the image. To reduce computational effort in rendering the image, the object view synthesis module can discretize the radiance field into a plurality of voxels (e.g., three-dimensional graphical units) .
- Volume density values associated with the plurality of voxels can be evaluated using a marching cube technique to extract surface information of the object in the radiance field as triangular mesh. Based on the triangular mesh, the object view synthesis module can generate a depth map of the object. Based on the depth map, a pre-defined offset can be biased or added to the depth map to reduce the computational effort in rendering the image, thereby enabling the object view synthesis module to render the objects under the NeRF framework in real-time.
- FIGURE 1 illustrates an example system 100, including an object view synthesis module 110, according to various embodiments of the present disclosure.
- the object view synthesis module 110 can be configured to generate real-time renderings of an object based on a NeRF framework.
- the object view synthesis module 110 can be implemented, in part or in whole, as software, hardware, or any combination thereof.
- the object view synthesis module 110 can be implemented, in part or in whole, as software running on one or more computing devices or systems, such as a cloud computing system.
- the object view synthesis module 110 can be implemented, in part or in whole, on a cloud computing system to generate images of an object under a NeRF framework from various selected perspectives or viewpoints.
- the object view synthesis module 110 can comprise a training data preparation module 112, a radiance field encoding module 114, and a volume rendering module 116. Each of these modules is discussed below.
- the system 100 can further include at least one data store 120.
- the object view synthesis module 110 can be configured to communicate and/or operate with the at least one data store 120.
- the at least one data store 120 can store various types of data associated with the object view synthesis module 110.
- the at least one data store 120 can store training data with which to train a machine learning model to encode a radiance field of an object onto the machine learning model.
- the training data can include, for example, images depicting the object from various viewpoints.
- the at least one data store 120 can store a plurality of images depicting a dog to train a machine learning model to encode a radiance field of the dog onto the machine learning model.
- the at least one data store 120 can store data relating to radiance fields such as radiance values and volume density values accessible to the object view synthesis module 110. In some embodiments, the at least one data store 120 can store various data relating to triangular mesh and depth maps accessible to the object view synthesis module 110. In some embodiments, the at least one data store 120 can store machine-readable instructions (e.g., codes) that, when executed, cause one or more computing systems to perform training of a machine learning model or render images based on radiance fields. Many variations are possible.
- the training data preparation module 112 can be configured to generate training data with which to train a machine learning model to encode a radiance field of an object on the machine learning model.
- a radiance field of an object can be a representation or visualization of the object in a three-dimensional rendering space through which various views of the object can be rendered (e.g., synthesized, constructed, etc. ) .
- the training data to encode the radiance field of the object can comprise a set of images depicting the object at various viewpoints.
- a first image in the set of images can depict the object in a frontal view
- a second image in the set of images can depict the object in a side view
- a third image can depict the object in a top view, etc.
- the training data preparation module 112 can covert the set of images into a continuous five-dimensional representation.
- each pixel in each image of the set of images can be represented by a position vector and a direction vector.
- the position vector can be represented in Euclidean coordinates (x, y, z) and the direction vector can be represented in spherical coordinates ( ⁇ , ⁇ ) .
- each pixel in each image of the set of images can be represented by parameters of x, y, z, ⁇ , and ⁇ , or in five dimensions, and the set of images can be represented by a single continuous string of parameters of x, y, z, ⁇ , and ⁇ .
- position vectors and direction vectors of pixels in an image can be determined based on a pose associated with the image.
- a pose of an image is an estimation of a position and an orientation (or direction) of an object depicted in the image from a center of a camera from which the image was captured.
- a pose of an image can be estimated based on a structure from motion (SfM) technique.
- a pose of an image can be estimated based on a simultaneous localization and mapping (SLAM) technique.
- SfM structure from motion
- SLAM simultaneous localization and mapping
- the radiance field encoding module 114 can be configured to encode a radiance field of an object onto a machine learning based on training data provided by the training data preparation module 112. Once the radiance field of the object is encoded onto the machine learning model, the machine learning model can be queried to output radiance values and volume density values associated with points in the radiance field from various vantage points.
- a vantage point can be a point in the radiance field (i.e., the three-dimensional rendering space) from which an imaginary light ray can be injected into the radiance field in a particular direction through the point.
- a vantage point can have a position vector indicating a location in the radiance field and a direction vector indicating a direction.
- the machine learning model can be queried based on a vantage point comprising a position vector concatenated with a direction vector.
- An imaginary light ray can be generated to travel through the radiance field at a point indicated by the position vector and at a direction indicated by the direction vector.
- the machine learning model can output radiance values and volume density values along points in the radiance field through which the imaginary light ray has traveled.
- the machine learning model can be implemented using a multilayer perceptron (MLP) .
- MLP multilayer perceptron
- the machine learning model can be implemented using other machine learning models.
- the machine learning model can be implemented using a neural network comprising nine fully-connected perceptron layers. This neural network can be trained to encode a radiance field of an object.
- the neural network can take a position vector corresponding to a point, as input, and output a volume density value and a feature vector for the point at the eighth layer of the neural network. The feature vector can then be concatenated with a direction vector corresponding to the point and pass it to the last layer of the neural network to output a radiance value for the point.
- a machine learning model through which the radiance field encoding module 114 is configured can be expressed as follows:
- the machine learning model can be expressed as a function, f, that takes the position vector and the direction vector as inputs and outputs the radiance values and the volume density values along the direction of the direction vector in the radiance field.
- parameters associated with the machine learning model can be optimized, through back-propagation, such that f converges to a reference radiance field (e.g., the ground truth radiance field) .
- a reference radiance field e.g., the ground truth radiance field
- training for the machine learning model is deemed complete and the parameters for the machine learning model become fixed.
- the trained machine learning model can output radiance values (e.g., r) and volume density values (e.g., ⁇ ) along any direction corresponding to any point (e.g., m, s) in the radiance field.
- a positional encoding technique can be applied during training of the machine learning model.
- the positional encoding technique can transform the position vector, m, and the direction vector, s, from a low-dimensional space (e.g., five dimensions) to a higher-dimensional space to increase fidelity of the radiance field.
- the positional encoding technique can be based on a sinusoidal expression as shown:
- ⁇ (x) [sin (2 0 x) , cos (2 0 x) , sin (2 1 x) , cos (2 1 x) , ..., sin (2 L x) , cos (2 L x) ]
- L is a hyper-parameter.
- L is set to 9 for ⁇ (m) and 4 for ⁇ (s) .
- the positional encoding technique can allow the machine learning model to take 76-dimension vectors as inputs instead of 5-dimension vectors (e.g., x, y, z, ⁇ , ⁇ ) to output radiance values and volume density values. In this way, the machine learning model can be biased toward encoding the radiance field in higher fidelity.
- the volume rendering module 116 can be configured to generate renderings (e.g., images, videos, etc. ) of an object based on a radiance field of the object.
- the volume rendering module 116 can generate a depth map of the object based on volume density values associated with the radiance field.
- the volume rendering module 116 can generate a rendering of the object for each given transformation matrix and projection matrix.
- the volume rendering module 116 can be biased with a predefined offset value added to the depth map of the object. This bias reduces a number of summation iterations needed to render color of each pixel in the rendering.
- the volume rendering module 116 will be discussed in further detail with reference to FIGURE 2 herein.
- FIGURE 2 illustrates an example volume rendering module 200, according to various embodiments of the present disclosure.
- the volume rendering module 116 of FIGURE 1 can be implemented as the volume rendering module 200.
- the volume rendering module 200 can include a depth map generation module 202, a perspective generation module 204, and an image rendering module 206. Each of these modules will be discussed below.
- the depth map generation module 202 can be configured to generate a depth map of an object based on a radiance field of the object. Initially, the depth map generation module 202 can evaluate volume density values of points associated with the radiance field of the object. The volume density values can represent opacities of the points. The depth map generation module 202 can discretize the radiance field into a plurality of voxels. A voxel is a unit of graphic information in a three-dimensional space, similar to a pixel in a two- dimensional image.
- the depth map generation module 202 can obtain a volume density value associated with each voxel by querying a machine learning model that encoded the radiance field to output the volume density value for the voxel. Based on volume density values of the plurality of voxels, the depth map generation module 202 can, using a marching cube technique, generate surfaces (e.g., isosurfaces) for the object and the surfaces can be represented as triangular meshes. In general, voxels corresponding surfaces of an object have approximately equal volume density values, voxels corresponding to regions near the surfaces of the object have high volume density values, and voxels corresponding to regions away from the surfaces of the object have low volume density values. Based on these principles, the triangular meshes for the object can be generated. Correspondingly, the depth map of the object can be generated based on the triangular meshes using conventional techniques.
- surfaces e.g., isosurfaces
- the perspective generation module 204 can be configured to generate a transformation matrix and a projection matrix with which to render an image of an object through a radiance field of the object.
- the transformation matrix and the projection matrix can be generated based on a point (e.g., a perspective) associated with the radiance field looking at the object.
- a radiance field can depict an artifact.
- a point in the radiance field can be selected such that the artifact is positioned and oriented in the radiance field from the perspective of the point (e.g., “framing” the artifact) .
- the transformation matrix and the projection matrix are transformations that, together, can transform vertices of the object from the radiance field to a two-dimensional image space. Through the image space, the image of the object can be rendered.
- the image rendering module 206 can be configured to render an image of an object for a given transformation matrix and a given projection matrix.
- the image rendering module 204 can utilize volume rendering techniques to render an image. In such techniques, an image can be rendered (constructed) by compositing pixels in an image space as indicated by the transformation and projection matrices.
- the image rendering module 206 can render color of each pixel based on an absorption and emission particle model in which color of a pixel is determined based on a light ray injected into a radiance field of the object.
- the image rendering module 204 can render color of a pixel based on a numerical quadrature expression shown:
- N is a total number of voxels along a light ray l.
- N can be 196.
- N 196
- a depth offset can be added to a depth map of the object as generated by the depth map generation module 202.
- the depth map can provide information relating to distributions of volume density values of voxels in a radiance field of the object. Because voxels corresponding to surfaces have relatively high volume density values and while voxels corresponding to non-surface have low volume density values, only voxels that have high volume density values are used to compute color values of pixels during image rendering, thereby greatly reducing number of summation iterations (e.g., N) needed to compute pixel color.
- the depth offset can be added to the depth map to determine a range of voxels with which to render color of a pixel.
- a light ray corresponding to a pixel can be injected into a radiance field of an object. This light ray can travel through 100 voxels in the radiance field.
- the 60 th voxel is a voxel with a high volume density value.
- a depth offset can be added to a depth map of the object and this depth offset can correspond to 5 voxels (e.g., a range of voxels) in the radiance field.
- the depth offset can be a pre-defined constant value.
- the depth offset can be user adjusted.
- adding a depth offset can reduce a number of voxels needed to compute a color value of a pixel from 196 to 8 voxels.
- the image rendering module 206 can render a color value of a pixel in 8 summation iteration, thereby allowing real-time rendering of an image under a NeRF framework possible.
- FIGURE 3 illustrates an example radiance field 300, according to various embodiments of the present disclosure.
- the radiance field 300 can be encoded onto a machine learning model by training the machine learning model with a set of images.
- the radiance field 300 can include an object 302 in a three-dimensional rendering space 304.
- An image 306 can be generated based on the radiance field 300 through a volume rendering technique.
- a pixel 308 of the image 306 can be associated with a position vector and a direction vector.
- the position vector can indicate a point in the three-dimensional rendering space 304 at which to generate an imaginary light ray 310.
- a direction of the imaginary light ray 310 can be indicated by the direction vector.
- the direction vector 310 can be injected into the three-dimensional rendering space 304.
- Points 312 associated with the radiance field 300 the imaginary light ray 310 traveled through can indicate radiance values and volume density values.
- the points 312 needed to render color of the pixel 308 can be reduced. Because a number of points needed to iterate is reduced, time needed to render the color of the pixel 308 can also be reduced, thereby allowing the image 306 to be rendered in or near real-time.
- the image 306 is rendered as a frontal view of the object 302.
- Other views of the object 302 can also be rendered through the volume rendering technique described herein.
- an isometric view of the object 302 can also be rendered through the volume rendering technique.
- FIGURE 4 illustrates a computing component 400 that includes one or more hardware processors 402 and a machine-readable storage media 404 storing a set of machine-readable/machine-executable instructions that, when executed, cause the hardware processor (s) 402 to perform a method, according to various embodiments of the present disclosure.
- the computing component 400 may be, for example, the computing system 500 of FIGURE 5.
- the hardware processors 402 may include, for example, the processor (s) 504 of FIGURE 5 or any other processing unit described herein.
- the machine-readable storage media 404 may include the main memory 506, the read-only memory (ROM) 508, the storage 510 of FIGURE 5, and/or any other suitable machine-readable storage media described herein.
- the processor 402 can encode a radiance field of an object onto a machine learning model.
- the radiance field can include a three-dimensional rendering space depicting the object.
- the machine learning model can be a fully connected neural network and trained based on a set of images depicting the object from various viewpoints. The set of images can be converted into a continuous five-dimensional representation.
- the processor 402 can generate a transformation matrix and a projection matrix with which to render an image of the object.
- the transformation matrix and the projection matrix can transform vertices of the object in the radiance field to a two-dimensional image space.
- the processor 402 can obtain, through the machine learning model, radiance values and volume density values associated with pixels of the image.
- the radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels.
- the machine learning model can output the radiance values and the volume density values based on the position vectors and the direction vectors.
- the processor 402 can render color associated with the pixels based on a volume rendering technique.
- the volume rendering technique can include an application of a depth offset to a depth map associated with the object.
- the radiance field can be discretized into a plurality of voxels.
- the depth map can be generated based on surface information of the object in the radiance field.
- the surface information can be represented in triangular mesh.
- the triangular mesh can be generated based on a marching cube technique.
- the marching cube technique can determine the surface information of the object based on volume density values associated with the plurality of voxels.
- the depth offset can be a pre-defined constant value.
- the techniques described herein, for example, are implemented by one or more special-purpose computing devices.
- the special-purpose computing devices may be hard-wired to perform the techniques, or may include circuitry or digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
- ASICs application-specific integrated circuits
- FPGAs field programmable gate arrays
- FIGURE 5 is a block diagram that illustrates a computer system 500 upon which any of various embodiments described herein may be implemented.
- the computer system 500 includes a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with bus 502 for processing information.
- a description that a device performs a task is intended to mean that one or more of the hardware processor (s) 504 performs.
- the computer system 500 also includes a main memory 506, such as a random access memory (RAM) , cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504.
- Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504.
- Such instructions when stored in storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
- the computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504.
- ROM read only memory
- a storage device 510 such as a magnetic disk, optical disk, or USB thumb drive (Flash drive) , etc., is provided and coupled to bus 502 for storing information and instructions.
- the computer system 500 may be coupled via bus 502 to output device (s) 512, such as a cathode ray tube (CRT) or LCD display (or touch screen) , for displaying information to a computer user.
- output device (s) 512 such as a cathode ray tube (CRT) or LCD display (or touch screen)
- Input device (s) 514 are coupled to bus 502 for communicating information and command selections to processor 504.
- cursor control 516 Another type of user input device.
- the computer system 500 also includes a communication interface 518 coupled to bus 502.
- phrases “at least one of, ” “at least one selected from the group of, ” or “at least one selected from the group consisting of, ” and the like are to be interpreted in the disjunctive (e.g., not to be interpreted as at least one of A and at least one of B) .
- a component being implemented as another component may be construed as the component being operated in a same or similar manner as the another component, and/or comprising same or similar features, characteristics, and parameters as the another component.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Generation (AREA)
Abstract
Described herein are systems, methods, and non-transitory computer-readable media configured to encode a radiance field of an object onto a machine learning model. A transformation matrix and a projection matrix with which to render an image of the object can be generated. Radiance values and volume density values associated with pixels of the image can be obtained through the machine learning model. The radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels. Color associated with the pixels can be rendered based on a volume rendering technique. The volume rendering technique can include an application of a depth offset to a depth map associated with the object.
Description
Neural radiance field, or NeRF, is a framework that allows rendering of objects by utilizing radiance fields of the objects. A radiance field of an object can be generally thought of as a representation or visualization of the object in a three-dimensional rendering space through which various renderings, such as images or videos, of the object can be generated. In this way, novel views or video animations of the object can be rendered (e.g., synthesized, constructed, etc. ) based on the radiance field. In general, a NeRF framework with which to generate various renderings of an object can be computationally intensive. For example, after a machine learning model has been trained to encode a radiance field of an object (e.g., a three-dimensional rendering space of the object) onto the machine learning model, computations that are needed to render various views of the object through the radiance field can be taxing and can require high computational effort. As such, under current NeRF frameworks, real-time rendering of objects through radiance fields can be challenging, ineffective, and in some cases, nearly impractical.
SUMMARY
Described herein, in various embodiments, are systems, methods, and non-transitory computer-readable media configured to encode a radiance field of an object onto a machine learning model. A transformation matrix and a projection matrix with which to render an image of the object can be generated. Radiance values and volume density values associated with pixels of the image can be obtained through the machine learning model. The radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels. Color associated with the pixels can be rendered based on a volume rendering technique. The volume rendering technique can include an application of a depth offset to a depth map associated with the object.
In some embodiments, the radiance field can include a three-dimensional rendering space depicting the object.
In some embodiments, the machine learning model can be a fully connected neural network and trained based on a set of images depicting the object from various viewpoints. The set of images can be converted into a continuous five-dimensional representation
In some embodiments, the radiance field can be discretized into a plurality of voxels. The depth map can be generated based on surface information of the object in the radiance field. The surface information can be represented in triangular mesh.
In some embodiments, the triangular mesh can be generated based on a marching cube technique. The marching cube technique can determine the surface information of the object based on volume density values associated with the plurality of voxels.
In some embodiments, the application of the depth offset to the depth map associated with the object can reduce a number of voxels needed for the volume rendering technique.
In some embodiments, the depth offset can be a pre-defined constant value and can indicate a range of voxels with which to render color of a pixel for the volume rendering technique.
In some embodiments, the application of the depth offset to the depth map associated with the object can reduce a number of iterations needed for the volume rendering technique.
In some embodiments, the transformation matrix and the projection matrix can transform vertices of the object in the radiance field to a two-dimensional image space.
In some embodiments, the machine learning model can output the radiance values and the volume density values based on the position vectors and the direction vectors.
These and other features of the apparatuses, systems, methods, and non-transitory computer-readable media disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for purposes of illustration and description only and are not intended as a definition of the limits of the invention.
Certain features of various embodiments of the present technology are set forth with particularity in the appended claims. A better understanding of the features and advantages of the technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
FIGURE 1 illustrates an example system, including an object view synthesis module, according to various embodiments of the present disclosure.
FIGURE 2 illustrates an example volume rendering module, according to various embodiments of the present disclosure.
FIGURE 3 illustrates an example radiance field, according to various embodiments of the present disclosure.
FIGURE 4 illustrates a computing component that includes one or more hardware processors and a machine-readable storage media storing a set of machine-readable/machine-executable instructions that, when executed, cause the hardware processor (s) to perform a method, according to various embodiments of the present disclosure.
FIGURE 5 is a block diagram that illustrates a computer system upon which any of various embodiments described herein may be implemented.
The figures depict various embodiments of the disclosed technology for purposes of illustration only, wherein the figures use like reference numerals to identify like elements. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated in the figures can be employed without departing from the principles of the disclosed technology described herein.
Neural radiance field, or NeRF, is a framework that allows rendering of objects by utilizing radiance fields of the objects. A radiance field of an object can be generally thought of as a representation or visualization of the object in a three-dimensional rendering space through which various renderings, such as images or videos, of the object can be generated. In this way, novel views or video animations of the object can be rendered (e.g., synthesized, constructed, etc. ) based on the radiance field. In general, a NeRF framework with which to generate various renderings of an object can be computationally intensive. For example, after a machine learning model has been trained to encode a radiance field of an object (e.g., a three-dimensional rendering space of the object) onto the machine learning model, computations that are needed to render various views of the object through the radiance field can be taxing and can require high computational effort. As such, under current NeRF frameworks, real-time rendering of objects through radiance fields can be challenging, ineffective, and in some cases, nearly impractical.
Described herein is a solution that addresses the problems described above. In various embodiments, an object view synthesis module can be configured to generate real-time renderings of an object based on a NeRF framework that reduces computational efforts needed to render the object in or near real-time. The object view synthesis module can train a machine learning model, such as a fully connected neural network, to encode a radiance field of the object onto the machine learning model. The machine learning model can be trained using a set of images depicting the object from various viewpoints. The radiance field can be a representation or visualization of the object in a three-dimensional rendering space through which various views of the object can be rendered (e.g., synthesized, constructed, etc. ) . Once the radiance field is encoded onto the machine learning model, the object view synthesis module can query the machine learning model to output radiance values and volume density values at various points associated with the radiance field from various vantage points. In general, a vantage point can be a point in the radiance field (i.e., the three-dimensional rendering space) from which an imaginary light ray can be injected into the radiance field in a particular direction through the point. As such, a vantage point can have a position vector indicating a location in the radiance field and a direction vector indicating a direction. For example, the object view synthesis module can query the machine learning model output radiance values and volume density values based on a vantage point comprising a position vector concatenated with a direction vector. In this example, the machine learning model can output the radiance values and the volume density values along points in the radiance field through which an imaginary light ray corresponding to the position vector and the direction vector has traveled through the vantage point. In general, a vantage point can correspond to a pixel of an image to be rendered by the object view synthesis module. Therefore, if the object view synthesis module is instructed to render an image comprising 900 pixels (i.e., 30 pixels by 30 pixels) , each pixel in the image can be a vantage point with which the object view synthesis module can query the machine learning model to output corresponding radiance values and volume density values. From the radiance values and the volume density values, color of each pixel can be rendered. Once radiance values and volume density values of pixels of an image to be rendered are obtained, the object view synthesis module can render the image. To reduce computational effort in rendering the image, the object view synthesis module can discretize the radiance field into a plurality of voxels (e.g., three-dimensional graphical units) . Volume density values associated with the plurality of voxels can be evaluated using a marching cube technique to extract surface information of the object in the radiance field as triangular mesh. Based on the triangular mesh, the object view synthesis module can generate a depth map of the object. Based on the depth map, a pre-defined offset can be biased or added to the depth map to reduce the computational effort in rendering the image, thereby enabling the object view synthesis module to render the objects under the NeRF framework in real-time. These and other features of the object view synthesis module are discussed herein.
FIGURE 1 illustrates an example system 100, including an object view synthesis module 110, according to various embodiments of the present disclosure. The object view synthesis module 110 can be configured to generate real-time renderings of an object based on a NeRF framework. In some embodiments, the object view synthesis module 110 can be implemented, in part or in whole, as software, hardware, or any combination thereof. In some embodiments, the object view synthesis module 110 can be implemented, in part or in whole, as software running on one or more computing devices or systems, such as a cloud computing system. For example, the object view synthesis module 110 can be implemented, in part or in whole, on a cloud computing system to generate images of an object under a NeRF framework from various selected perspectives or viewpoints. Many variations are possible. In some embodiments, the object view synthesis module 110 can comprise a training data preparation module 112, a radiance field encoding module 114, and a volume rendering module 116. Each of these modules is discussed below.
In some embodiments, as shown in FIGURE 1, the system 100 can further include at least one data store 120. The object view synthesis module 110 can be configured to communicate and/or operate with the at least one data store 120. The at least one data store 120 can store various types of data associated with the object view synthesis module 110. For example, the at least one data store 120 can store training data with which to train a machine learning model to encode a radiance field of an object onto the machine learning model. The training data can include, for example, images depicting the object from various viewpoints. For instance, the at least one data store 120 can store a plurality of images depicting a dog to train a machine learning model to encode a radiance field of the dog onto the machine learning model. In some embodiments, the at least one data store 120 can store data relating to radiance fields such as radiance values and volume density values accessible to the object view synthesis module 110. In some embodiments, the at least one data store 120 can store various data relating to triangular mesh and depth maps accessible to the object view synthesis module 110. In some embodiments, the at least one data store 120 can store machine-readable instructions (e.g., codes) that, when executed, cause one or more computing systems to perform training of a machine learning model or render images based on radiance fields. Many variations are possible.
In some embodiments, the training data preparation module 112 can be configured to generate training data with which to train a machine learning model to encode a radiance field of an object on the machine learning model. In general, a radiance field of an object can be a representation or visualization of the object in a three-dimensional rendering space through which various views of the object can be rendered (e.g., synthesized, constructed, etc. ) . The training data to encode the radiance field of the object can comprise a set of images depicting the object at various viewpoints. For example, a first image in the set of images can depict the object in a frontal view, a second image in the set of images can depict the object in a side view, in the set of images a third image can depict the object in a top view, etc. To reduce complexity and time to train the machine learning model, the training data preparation module 112 can covert the set of images into a continuous five-dimensional representation. In the continuous five- dimensional representation, each pixel in each image of the set of images can be represented by a position vector and a direction vector. The position vector can be represented in Euclidean coordinates (x, y, z) and the direction vector can be represented in spherical coordinates (θ, φ) . As such, each pixel in each image of the set of images can be represented by parameters of x, y, z, θ, and φ, or in five dimensions, and the set of images can be represented by a single continuous string of parameters of x, y, z, θ, and φ. By representing training data in such a manner, dimensionality of training the machine learning model to encode the radiance field of the object can be greatly reduced, thereby reducing time needed to train the machine learning model. In some embodiments, position vectors and direction vectors of pixels in an image can be determined based on a pose associated with the image. A pose of an image is an estimation of a position and an orientation (or direction) of an object depicted in the image from a center of a camera from which the image was captured. In one implementation, a pose of an image can be estimated based on a structure from motion (SfM) technique. In another implementation, a pose of an image can be estimated based on a simultaneous localization and mapping (SLAM) technique. Many variations are possible.
In some embodiments, the radiance field encoding module 114 can be configured to encode a radiance field of an object onto a machine learning based on training data provided by the training data preparation module 112. Once the radiance field of the object is encoded onto the machine learning model, the machine learning model can be queried to output radiance values and volume density values associated with points in the radiance field from various vantage points. In general, a vantage point can be a point in the radiance field (i.e., the three-dimensional rendering space) from which an imaginary light ray can be injected into the radiance field in a particular direction through the point. As such, a vantage point can have a position vector indicating a location in the radiance field and a direction vector indicating a direction. As an illustrative example, in some embodiments, the machine learning model can be queried based on a vantage point comprising a position vector concatenated with a direction vector. An imaginary light ray can be generated to travel through the radiance field at a point indicated by the position vector and at a direction indicated by the direction vector. In this example, the machine learning model can output radiance values and volume density values along points in the radiance field through which the imaginary light ray has traveled. In some embodiments, the machine learning model can be implemented using a multilayer perceptron (MLP) . In other embodiments, the machine learning model can be implemented using other machine learning models. Many variations are possible. For example, in one implementation, the machine learning model can be implemented using a neural network comprising nine fully-connected perceptron layers. This neural network can be trained to encode a radiance field of an object. In this implementation, the neural network can take a position vector corresponding to a point, as input, and output a volume density value and a feature vector for the point at the eighth layer of the neural network. The feature vector can then be concatenated with a direction vector corresponding to the point and pass it to the last layer of the neural network to output a radiance value for the point.
In some embodiments, a machine learning model through which the radiance field encoding module 114 is configured can be expressed as follows:
f (m, s) = [ρ, r]
where m is a position vector at a point in a radiance field, s is a direction vector at the point in the radiance field, ρ is volume density values along a direction of the direction vector in the radiance field, and r is radiance values along the direction of the direction vector in the radiance field. In this regard, the machine learning model can be expressed as a function, f, that takes the position vector and the direction vector as inputs and outputs the radiance values and the volume density values along the direction of the direction vector in the radiance field. During training of the machine learning model, parameters associated with the machine learning model (e.g., weights of the neural network) can be optimized, through back-propagation, such that f converges to a reference radiance field (e.g., the ground truth radiance field) . Once f converges to the reference radiance field within some thresholds, training for the machine learning model is deemed complete and the parameters for the machine learning model become fixed. The trained machine learning model can output radiance values (e.g., r) and volume density values (e.g., ρ) along any direction corresponding to any point (e.g., m, s) in the radiance field.
In some embodiments, to prevent over-smoothing of the radiance field, a positional encoding technique can be applied during training of the machine learning model. The positional encoding technique can transform the position vector, m, and the direction vector, s, from a low-dimensional space (e.g., five dimensions) to a higher-dimensional space to increase fidelity of the radiance field. In some embodiments, the positional encoding technique can be based on a sinusoidal expression as shown:
γ (x) = [sin (2
0x) , cos (2
0x) , sin (2
1x) , cos (2
1x) , ..., sin (2
Lx) , cos (2
Lx) ]
where L is a hyper-parameter. In one implementation, L is set to 9 for γ (m) and 4 for γ (s) . In this implementation, the positional encoding technique can allow the machine learning model to take 76-dimension vectors as inputs instead of 5-dimension vectors (e.g., x, y, z, θ, φ) to output radiance values and volume density values. In this way, the machine learning model can be biased toward encoding the radiance field in higher fidelity.
In some embodiments, the volume rendering module 116 can be configured to generate renderings (e.g., images, videos, etc. ) of an object based on a radiance field of the object. The volume rendering module 116 can generate a depth map of the object based on volume density values associated with the radiance field. The volume rendering module 116 can generate a rendering of the object for each given transformation matrix and projection matrix. To generate the rendering of the object in or near real-time, the volume rendering module 116 can be biased with a predefined offset value added to the depth map of the object. This bias reduces a number of summation iterations needed to render color of each pixel in the rendering. The volume rendering module 116 will be discussed in further detail with reference to FIGURE 2 herein.
FIGURE 2 illustrates an example volume rendering module 200, according to various embodiments of the present disclosure. In some embodiments, the volume rendering module 116 of FIGURE 1 can be implemented as the volume rendering module 200. As shown in FIGURE 2, in some embodiments, the volume rendering module 200 can include a depth map generation module 202, a perspective generation module 204, and an image rendering module 206. Each of these modules will be discussed below.
In some embodiments, the depth map generation module 202 can be configured to generate a depth map of an object based on a radiance field of the object. Initially, the depth map generation module 202 can evaluate volume density values of points associated with the radiance field of the object. The volume density values can represent opacities of the points. The depth map generation module 202 can discretize the radiance field into a plurality of voxels. A voxel is a unit of graphic information in a three-dimensional space, similar to a pixel in a two- dimensional image. The depth map generation module 202 can obtain a volume density value associated with each voxel by querying a machine learning model that encoded the radiance field to output the volume density value for the voxel. Based on volume density values of the plurality of voxels, the depth map generation module 202 can, using a marching cube technique, generate surfaces (e.g., isosurfaces) for the object and the surfaces can be represented as triangular meshes. In general, voxels corresponding surfaces of an object have approximately equal volume density values, voxels corresponding to regions near the surfaces of the object have high volume density values, and voxels corresponding to regions away from the surfaces of the object have low volume density values. Based on these principles, the triangular meshes for the object can be generated. Correspondingly, the depth map of the object can be generated based on the triangular meshes using conventional techniques.
In some embodiments, the perspective generation module 204 can be configured to generate a transformation matrix and a projection matrix with which to render an image of an object through a radiance field of the object. The transformation matrix and the projection matrix can be generated based on a point (e.g., a perspective) associated with the radiance field looking at the object. For example, a radiance field can depict an artifact. In this example, a point in the radiance field can be selected such that the artifact is positioned and oriented in the radiance field from the perspective of the point (e.g., “framing” the artifact) . In this regard, the transformation matrix and the projection matrix are transformations that, together, can transform vertices of the object from the radiance field to a two-dimensional image space. Through the image space, the image of the object can be rendered.
In some embodiments, the image rendering module 206 can be configured to render an image of an object for a given transformation matrix and a given projection matrix. In general, the image rendering module 204 can utilize volume rendering techniques to render an image. In such techniques, an image can be rendered (constructed) by compositing pixels in an image space as indicated by the transformation and projection matrices. The image rendering module 206 can render color of each pixel based on an absorption and emission particle model in which color of a pixel is determined based on a light ray injected into a radiance field of the object. In some embodiments, the image rendering module 204 can render color of a pixel based on a numerical quadrature expression shown:
where
is a color value of a pixel, ρ
i is a volume density value of an i-th voxel obtained through a machine learning model that encoded the radiance field, r
i is a radiance value of the i-th voxel obtained through the machine learning model that encoded the radiance field, and δ
i is a Euclidean distance between the i-th and an (i+1) -th voxel, and N is a total number of voxels along a light ray l. In volume rendering techniques, N can be 196. As such, as shown in the expression, rendering a color value of a pixel of an image take up to 196 summation iterations (i.e., N = 196) and take into account radiance values and volume density values of 196 voxels. As such, these volume rendering techniques can be computationally impractical to render an image in real-time.
In some embodiments, to reduce computational effort of rendering an image of an object, a depth offset can be added to a depth map of the object as generated by the depth map generation module 202. In general, the depth map can provide information relating to distributions of volume density values of voxels in a radiance field of the object. Because voxels corresponding to surfaces have relatively high volume density values and while voxels corresponding to non-surface have low volume density values, only voxels that have high volume density values are used to compute color values of pixels during image rendering, thereby greatly reducing number of summation iterations (e.g., N) needed to compute pixel color. The depth offset can be added to the depth map to determine a range of voxels with which to render color of a pixel. For example, a light ray corresponding to a pixel can be injected into a radiance field of an object. This light ray can travel through 100 voxels in the radiance field. In this example, the 60
th voxel is a voxel with a high volume density value. In this example, a depth offset can be added to a depth map of the object and this depth offset can correspond to 5 voxels (e.g., a range of voxels) in the radiance field. Therefore, to compute a color value for the pixel, radiance values and volume density values of the 60
th voxel, the 61
st voxel, the 62
nd voxel, the 63
rd voxel, the 64
th voxel, and the 65
th voxel are summed using the numerical quadrature expression above. In some embodiments, the depth offset can be a pre-defined constant value. In other embodiments, the depth offset can be user adjusted. In some embodiments, adding a depth offset can reduce a number of voxels needed to compute a color value of a pixel from 196 to 8 voxels. As such, in such embodiments, the image rendering module 206 can render a color value of a pixel in 8 summation iteration, thereby allowing real-time rendering of an image under a NeRF framework possible.
FIGURE 3 illustrates an example radiance field 300, according to various embodiments of the present disclosure. As discussed above, the radiance field 300 can be encoded onto a machine learning model by training the machine learning model with a set of images. As shown in FIGURE 3, the radiance field 300 can include an object 302 in a three-dimensional rendering space 304. An image 306 can be generated based on the radiance field 300 through a volume rendering technique. In the volume rendering technique, a pixel 308 of the image 306 can be associated with a position vector and a direction vector. The position vector can indicate a point in the three-dimensional rendering space 304 at which to generate an imaginary light ray 310. A direction of the imaginary light ray 310 can be indicated by the direction vector. The direction vector 310 can be injected into the three-dimensional rendering space 304. Points 312 associated with the radiance field 300 the imaginary light ray 310 traveled through can indicate radiance values and volume density values. By introducing an offset to a depth map associated with the object 302 (not shown) , the points 312 needed to render color of the pixel 308 can be reduced. Because a number of points needed to iterate is reduced, time needed to render the color of the pixel 308 can also be reduced, thereby allowing the image 306 to be rendered in or near real-time. In FIGURE 3, for simplicity, the image 306 is rendered as a frontal view of the object 302. Other views of the object 302 can also be rendered through the volume rendering technique described herein. For example, an isometric view of the object 302 can also be rendered through the volume rendering technique.
FIGURE 4 illustrates a computing component 400 that includes one or more hardware processors 402 and a machine-readable storage media 404 storing a set of machine-readable/machine-executable instructions that, when executed, cause the hardware processor (s) 402 to perform a method, according to various embodiments of the present disclosure. The computing component 400 may be, for example, the computing system 500 of FIGURE 5. The hardware processors 402 may include, for example, the processor (s) 504 of FIGURE 5 or any other processing unit described herein. The machine-readable storage media 404 may include the main memory 506, the read-only memory (ROM) 508, the storage 510 of FIGURE 5, and/or any other suitable machine-readable storage media described herein.
At block 406, the processor 402 can encode a radiance field of an object onto a machine learning model. The radiance field can include a three-dimensional rendering space depicting the object. The machine learning model can be a fully connected neural network and trained based on a set of images depicting the object from various viewpoints. The set of images can be converted into a continuous five-dimensional representation.
At block 408, the processor 402 can generate a transformation matrix and a projection matrix with which to render an image of the object. The transformation matrix and the projection matrix can transform vertices of the object in the radiance field to a two-dimensional image space.
At block 410, the processor 402 can obtain, through the machine learning model, radiance values and volume density values associated with pixels of the image. The radiance values and the volume density values can be obtained based on position vectors and direction vectors associated with the pixels. The machine learning model can output the radiance values and the volume density values based on the position vectors and the direction vectors.
At block 412, the processor 402 can render color associated with the pixels based on a volume rendering technique. The volume rendering technique can include an application of a depth offset to a depth map associated with the object. The radiance field can be discretized into a plurality of voxels. the depth map can be generated based on surface information of the object in the radiance field. The surface information can be represented in triangular mesh. The triangular mesh can be generated based on a marching cube technique. The marching cube technique can determine the surface information of the object based on volume density values associated with the plurality of voxels. The depth offset can be a pre-defined constant value.
The techniques described herein, for example, are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include circuitry or digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
FIGURE 5 is a block diagram that illustrates a computer system 500 upon which any of various embodiments described herein may be implemented. The computer system 500 includes a bus 502 or other communication mechanism for communicating information, one or more hardware processors 504 coupled with bus 502 for processing information. A description that a device performs a task is intended to mean that one or more of the hardware processor (s) 504 performs.
The computer system 500 also includes a main memory 506, such as a random access memory (RAM) , cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
The computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive) , etc., is provided and coupled to bus 502 for storing information and instructions.
The computer system 500 may be coupled via bus 502 to output device (s) 512, such as a cathode ray tube (CRT) or LCD display (or touch screen) , for displaying information to a computer user. Input device (s) 514, including alphanumeric and other keys, are coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516. The computer system 500 also includes a communication interface 518 coupled to bus 502.
Unless the context requires otherwise, throughout the present specification and claims, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense, that is as “including, but not limited to. ” Recitation of numeric ranges of values throughout the specification is intended to serve as a shorthand notation of referring individually to each separate value falling within the range inclusive of the values defining the range, and each separate value is incorporated in the specification as it were individually recited herein. Additionally, the singular forms “a, ” “an” and “the” include plural referents unless the context clearly dictates otherwise. The phrases “at least one of, ” “at least one selected from the group of, ” or “at least one selected from the group consisting of, ” and the like are to be interpreted in the disjunctive (e.g., not to be interpreted as at least one of A and at least one of B) .
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may be in some instances. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiment.
A component being implemented as another component may be construed as the component being operated in a same or similar manner as the another component, and/or comprising same or similar features, characteristics, and parameters as the another component.
Claims (20)
- A computer-implemented method comprising:encoding, by a computing system, a radiance field of an object onto a machine learning model;generating, by the computing system, a transformation matrix and a projection matrix with which to render an image of the object;obtaining, by the computing system, through the machine learning model, radiance values and volume density values associated with pixels of the image, wherein the radiance values and the volume density values are obtained based on position vectors and direction vectors associated with the pixels; andrendering, by the computing system, color associated with the pixels based on a volume rendering technique, wherein the volume rendering technique includes an application of a depth offset to a depth map associated with the object.
- The computer-implemented method of claim 1, wherein the radiance field includes a three-dimensional rendering space depicting the object.
- The computer-implemented method of claim 1, wherein the machine learning model is a fully connected neural network and trained based on a set of images depicting the object from various viewpoints, wherein the set of images is converted into a continuous five-dimensional representation.
- The computer-implemented method of claim 1, wherein the radiance field is discretized into a plurality of voxels, and wherein the depth map is generated based on surface information of the object in the radiance field, and wherein the surface information is represented in triangular mesh.
- The computer-implemented method of claim 4, wherein the triangular mesh is generated based on a marching cube technique, and wherein the marching cube technique determines the surface information of the object based on volume density values associated with the plurality of voxels.
- The computer-implemented method of claim 4, wherein the application of the depth offset to the depth map associated with the object reduces a number of voxels needed for the volume rendering technique.
- The computer-implemented method of claim 1, wherein the depth offset is a pre-defined constant value and indicates a range of voxels with which to render color of a pixel for the volume rendering technique.
- The computer-implemented method of claim 1, wherein the application of the depth offset to the depth map associated with the object reduces a number of iterations needed for the volume rendering technique.
- The computer-implemented method of claim 1, wherein the transformation matrix and the projection matrix transform vertices of the object in the radiance field to a two-dimensional image space.
- The computer-implemented method of claim 1, wherein the machine learning model outputs the radiance values and the volume density values based on the position vectors and the direction vectors.
- A system comprising:one or more processors; anda memory storing instructions that, when executed by the one or more processors, cause the system to perform a method comprising:encoding a radiance field of an object onto a machine learning model;generating a transformation matrix and a projection matrix with which to render an image of the object;obtaining, through the machine learning model, radiance values and volume density values associated with pixels of the image, wherein the radiance values and the volume density values are obtained based on position vectors and direction vectors associated with the pixels; andrendering color associated with the pixels based on a volume rendering technique, wherein the volume rendering technique includes an application of a depth offset to a depth map associated with the object.
- The system of claim 11, wherein the radiance field includes a three-dimensional rendering space depicting the object.
- The system of claim 12, wherein the machine learning model is a fully connected neural network and trained based on a set of images depicting the object from various viewpoints, wherein the set of images is converted into a continuous five-dimensional representation.
- The system of claim 13, wherein the radiance field is discretized into a plurality of voxels, and wherein the depth map is generated based on surface information of the object in the radiance field, and wherein the surface information is represented in triangular mesh.
- The system of claim 11, wherein the triangular mesh is generated based on a marching cube technique, and wherein the marching cube technique determines the surface information of the object based on volume density values associated with the plurality of voxels.
- A non-transitory storage medium of a computing system storing instructions that, when executed by one or more processors of the computing system, cause the computing system to perform a method comprising:encoding a radiance field of an object onto a machine learning model;generating a transformation matrix and a projection matrix with which to render an image of the object;obtaining, through the machine learning model, radiance values and volume density values associated with pixels of the image, wherein the radiance values and the volume density values are obtained based on position vectors and direction vectors associated with the pixels; andrendering color associated with the pixels based on a volume rendering technique, wherein the volume rendering technique includes an application of a depth offset to a depth map associated with the object.
- The non-transitory storage medium of claim 16, wherein the radiance field includes a three-dimensional rendering space depicting the object.
- The non-transitory storage medium of claim 17, wherein the machine learning model is a fully connected neural network and trained based on a set of images depicting the object from various viewpoints, wherein the set of images is converted into a continuous five-dimensional representation.
- The non-transitory storage medium of claim 18, wherein the radiance field is discretized into a plurality of voxels, and wherein the depth map is generated based on surface information of the object in the radiance field, and wherein the surface information is represented in triangular mesh.
- The non-transitory storage medium of claim 16, wherein the triangular mesh is generated based on a marching cube technique, and wherein the marching cube technique determines the surface information of the object based on volume density values associated with the plurality of voxels.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202180095920.4A CN117083638A (en) | 2021-03-26 | 2021-03-26 | Accelerated Neural Radiation Fields for View Synthesis |
| PCT/CN2021/083446 WO2022198686A1 (en) | 2021-03-26 | 2021-03-26 | Accelerated neural radiance fields for view synthesis |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2021/083446 WO2022198686A1 (en) | 2021-03-26 | 2021-03-26 | Accelerated neural radiance fields for view synthesis |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022198686A1 true WO2022198686A1 (en) | 2022-09-29 |
Family
ID=83395117
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/083446 Ceased WO2022198686A1 (en) | 2021-03-26 | 2021-03-26 | Accelerated neural radiance fields for view synthesis |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN117083638A (en) |
| WO (1) | WO2022198686A1 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115880378A (en) * | 2022-11-15 | 2023-03-31 | 中国科学院自动化研究所 | Method and device for determining color information in radiation field |
| CN117058293A (en) * | 2023-08-15 | 2023-11-14 | 北京航空航天大学 | Scene self-adaptive fixation point nerve radiation field rendering method and system |
| CN117274472A (en) * | 2023-08-16 | 2023-12-22 | 武汉大学 | A method and system for generating aerial true imaging images based on implicit three-dimensional expression |
| WO2025006145A1 (en) * | 2023-06-30 | 2025-01-02 | Sony Interactive Entertainment LLC | Shaping neural radiance field (nerf) generation using multiple polygonal meshes |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070257913A1 (en) * | 2002-03-21 | 2007-11-08 | Microsoft Corporation | Graphics image rendering with radiance self-transfer for low-frequency lighting environments |
| US20090006052A1 (en) * | 2007-06-29 | 2009-01-01 | Microsoft Corporation | Real-Time Rendering of Light-Scattering Media |
| CN101477702A (en) * | 2009-02-06 | 2009-07-08 | 南京师范大学 | Built-in real tri-dimension driving method for computer display card |
| CN107680073A (en) * | 2016-08-02 | 2018-02-09 | 富士通株式会社 | The method and apparatus of geometrical reconstruction object |
| CN111667571A (en) * | 2020-06-08 | 2020-09-15 | 南华大学 | Method, Apparatus, Equipment and Medium for Rapid Reconstruction of Three-dimensional Distribution of Source Items in Nuclear Facilities |
-
2021
- 2021-03-26 CN CN202180095920.4A patent/CN117083638A/en active Pending
- 2021-03-26 WO PCT/CN2021/083446 patent/WO2022198686A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070257913A1 (en) * | 2002-03-21 | 2007-11-08 | Microsoft Corporation | Graphics image rendering with radiance self-transfer for low-frequency lighting environments |
| US20090006052A1 (en) * | 2007-06-29 | 2009-01-01 | Microsoft Corporation | Real-Time Rendering of Light-Scattering Media |
| CN101477702A (en) * | 2009-02-06 | 2009-07-08 | 南京师范大学 | Built-in real tri-dimension driving method for computer display card |
| CN107680073A (en) * | 2016-08-02 | 2018-02-09 | 富士通株式会社 | The method and apparatus of geometrical reconstruction object |
| CN111667571A (en) * | 2020-06-08 | 2020-09-15 | 南华大学 | Method, Apparatus, Equipment and Medium for Rapid Reconstruction of Three-dimensional Distribution of Source Items in Nuclear Facilities |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115880378A (en) * | 2022-11-15 | 2023-03-31 | 中国科学院自动化研究所 | Method and device for determining color information in radiation field |
| WO2025006145A1 (en) * | 2023-06-30 | 2025-01-02 | Sony Interactive Entertainment LLC | Shaping neural radiance field (nerf) generation using multiple polygonal meshes |
| US12469220B2 (en) | 2023-06-30 | 2025-11-11 | Sony Interactive Entertainment LLC | Shaping neural radiance field (NERF) generation using multiple polygonal meshes |
| CN117058293A (en) * | 2023-08-15 | 2023-11-14 | 北京航空航天大学 | Scene self-adaptive fixation point nerve radiation field rendering method and system |
| CN117274472A (en) * | 2023-08-16 | 2023-12-22 | 武汉大学 | A method and system for generating aerial true imaging images based on implicit three-dimensional expression |
| CN117274472B (en) * | 2023-08-16 | 2024-05-31 | 武汉大学 | Aviation true projection image generation method and system based on implicit three-dimensional expression |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117083638A (en) | 2023-11-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022198686A1 (en) | Accelerated neural radiance fields for view synthesis | |
| CN110458939B (en) | Indoor scene modeling method based on visual angle generation | |
| Littwin et al. | Deep meta functionals for shape representation | |
| Trevithick et al. | Grf: Learning a general radiance field for 3d scene representation and rendering | |
| US20240013479A1 (en) | Methods and Systems for Training Quantized Neural Radiance Field | |
| US20240005590A1 (en) | Deformable neural radiance fields | |
| CN114170290B (en) | Image processing method and related equipment | |
| US11544898B2 (en) | Method, computer device and storage medium for real-time urban scene reconstruction | |
| WO2023004559A1 (en) | Editable free-viewpoint video using a layered neural representation | |
| CN117541755B (en) | RGB-D three-dimensional reconstruction-based rigid object virtual-real shielding method | |
| CN118674905B (en) | A perspective synthesis method based on 3D Gaussian splashing technology based on spatial coupling | |
| CN115564639B (en) | Background blur method, device, computer equipment and storage medium | |
| CN118781000B (en) | A monocular dense SLAM map construction method based on image enhancement and NeRF | |
| US20240104822A1 (en) | Multicore system for neural rendering | |
| US20220165029A1 (en) | Computer Vision Systems and Methods for High-Fidelity Representation of Complex 3D Surfaces Using Deep Unsigned Distance Embeddings | |
| CN118158489A (en) | Efficient streaming free viewpoint video generation method, computer device and program product based on 3D Gaussian model | |
| Lin et al. | A-SATMVSNet: An attention-aware multi-view stereo matching network based on satellite imagery | |
| CN115984583B (en) | Data processing method, apparatus, computer device, storage medium, and program product | |
| US20240371078A1 (en) | Real-time volumetric rendering | |
| CN114118367B (en) | Method and equipment for constructing incremental nerve radiation field | |
| CN120182507B (en) | NeRF and 3DGS mixed representation-based large-scene lightweight three-dimensional reconstruction method | |
| CN119006741A (en) | Three-dimensional reconstruction method, system, equipment and medium based on compressed symbol distance field | |
| US20220058484A1 (en) | Method for training a neural network to deliver the viewpoints of objects using unlabeled pairs of images, and the corresponding system | |
| Li et al. | Omnivoxel: A fast and precise reconstruction method of omnidirectional neural radiance field | |
| US20250363741A1 (en) | Depth rendering from neural radiance fields for 3d modeling |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21932304 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202180095920.4 Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21932304 Country of ref document: EP Kind code of ref document: A1 |