WO2025199044A1 - Systems and methods for ai-assisted construction of three-dimensional models - Google Patents
Systems and methods for ai-assisted construction of three-dimensional modelsInfo
- Publication number
- WO2025199044A1 WO2025199044A1 PCT/US2025/020256 US2025020256W WO2025199044A1 WO 2025199044 A1 WO2025199044 A1 WO 2025199044A1 US 2025020256 W US2025020256 W US 2025020256W WO 2025199044 A1 WO2025199044 A1 WO 2025199044A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- images
- image
- encodings
- views
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
- G06T15/205—Image-based rendering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
Definitions
- Computing devices may render three-dimensional (3D) objects (sometimes referred to herein as “3D graphical objects,” “3D digital objects,” “3D models,” or simply “objects” or “models”) for various purposes.
- a computing device may render 3D graphical objects for use in video games, simulations, and/or online environments. Such 3D graphical objects may be generated in a variety of different ways.
- a computing device may generate a 3D graphical object based on input received from a human operator via a computer- aided design (CAD) tool.
- CAD computer- aided design
- a computing device may generate a 3D graphical object based on images and/or scans of a corresponding tangible, real-world object.
- the techniques described herein relate to a three-dimensional modeling method including: (a) obtaining, by one or more processors, a first plurality of encodings of a first plurality of images of an object; (b) generating, by the one or more processors and one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, by the one or more processors based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) - (c) until the one or more processors determine that the 3D model of the object satisfies one or more criteria.
- the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a ACTIVE 708712445v1 1 Docket No. 212860-700001/PCT first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object.
- the techniques described herein relate to a method, wherein the first 3D model of the object is obtained from a vector database based on a description of the object.
- the techniques described herein relate to a method, further including: obtaining, from a vector database based on a description of the object, the first plurality of images of the object.
- the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; rendering, from a set of viewpoints of the first 3D model of the object, a set of views of the object; calculating a plurality of scores based on encodings of views in the set of views, wherein the scores indicate an amount of similarity between or among groups of two or more views in the set of views; and obtaining, based on the plurality of scores, the first plurality of images of the object.
- the techniques described herein relate to a method, wherein obtaining the first plurality of images includes: selecting, based on the plurality of scores, a subset of the set of views, wherein the first plurality of images is the selected subset of the set of views.
- the techniques described herein relate to a method, wherein the set of views is a first set of views, the set of viewpoints is a first set of viewpoints, and obtaining the first plurality of images of the object includes: determining, based on the plurality of scores, a second set of viewpoints of the object; and rendering, by the one or more processors, a second set of views of the first 3D model of the object from the second set of viewpoints, wherein the first plurality of images is the second set of views.
- the techniques described herein relate to a method, wherein: the encodings of the views in the set of views are embeddings of the views in the set of views, and the amount of similarity between or among a respective group of two or more views is determined based on cosine similarity between the embeddings of respective views of the two or more views.
- the techniques described herein relate to a method, wherein a total number of images in the first plurality of images is less than a total number of views in the set of views. ACTIVE 708712445v1 2 Docket No.
- the techniques described herein relate to a method, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image- generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data.
- the techniques described herein relate to a method, wherein the conditioning data include a description of the object.
- the techniques described herein relate to a method, wherein the description of the object includes a description of one or more geometric attributes of the object, one or more visual attributes of the object, and/or one or more optical attributes of the object.
- the techniques described herein relate to a method, wherein the conditioning data include a description of one or more alterations to (1) an aesthetic of the object as depicted in the first plurality of images or represented in the first 3D model of the object, (2) a geometric attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object, (3) a visual attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object, and/or (4) an optical attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object.
- the techniques described herein relate to a method, wherein the generating the second plurality of embeddings further includes providing, by the one or more processors, the first plurality of embeddings as input to the one or more image-generating models.
- the techniques described herein relate to a method, wherein the one or more image-generating models include a latent, text-to-image diffusion model.
- the techniques described herein relate to a method, wherein the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model.
- the techniques described herein relate to a method, wherein one or more visual attributes and/or optical attributes of the object includes a texture, material, shading, lighting, reflectivity, and/or color of a surface of the object.
- ACTIVE 708712445v1 3 Docket No. 212860-700001/PCT [0020]
- the techniques described herein relate to a method, wherein the one or more criteria include (1) receipt of user input indicating that the 3D model of the object is satisfactory, (2) receipt of user input requesting termination of modeling, (3) expiry of a maximum time period allocated for the modeling, and/or (4) use of a maximum amount of computational resources allocated for the modeling.
- the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to perform operations including: (a) obtaining a first plurality of encodings of a first plurality of images of an object; (b) generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) - (c) until the at least one processor determines that the 3D model of the object satisfies one or more criteria.
- the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: (a) obtaining a first plurality of encodings of a first plurality of images of an object; (b) generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) - (c) until the at least one processor determines that the 3D model of the object satisfies one or more criteria.
- the techniques described herein relate to a three-dimensional modeling method including: obtaining, by one or more processors, a plurality of encodings of a first plurality of images of an object; calculating, by the one or more processors, a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; ACTIVE 708712445v1 4 Docket No.
- 212860-700001/PCT obtaining, by the one or more processors and based on the plurality of scores, a second plurality of images; and reconstructing, by the one or more processors and based on the second plurality of images, a three-dimensional (3D) model of the object.
- the techniques described herein relate to a method, wherein the first plurality of images depict a plurality of views of the object, and wherein the object is a virtual 3D object or a physical 3D object.
- the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object includes: rendering, by the one or more processors, a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views.
- the techniques described herein relate to a method, wherein obtaining the second plurality of images includes: selecting, by the one or more processors and based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images.
- the techniques described herein relate to a method, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object includes: determining, by the one or more processors and based on the plurality of scores, a second plurality of viewpoints of the object; and rendering, by the one or more processors, a second plurality of views of the first 3D model of the object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views.
- the techniques described herein relate to a method, wherein the obtaining the first plurality of images of the object further includes: obtaining, by the one or more processors, the first 3D model of the object. [0029] In some aspects, the techniques described herein relate to a method, wherein obtaining the first 3D model of the object includes: generating, by the one or more processors and one or more image-generating models, one or more views of the object based on a description of the object, wherein the first 3D model of the object is reconstructed based on at least a subset of the one or more generated views.
- the techniques described herein relate to a method, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images.
- the techniques described herein relate to a method, wherein a total number of the second plurality of images is less than a total number of the first plurality of images.
- the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method including: obtaining a plurality of encodings of a first plurality of images of an object; calculating a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; obtaining, based on the plurality of scores, a second plurality of images a second plurality of images based on scores for each image of the first plurality of images; and reconstructing, based on the second plurality of images, a three-dimensional (3D) model of the object.
- 3D three-dimensional
- the techniques described herein relate to an at least one computer- readable storage medium, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object includes: rendering a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein obtaining the second plurality of images includes: selecting, based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein a total number of the second plurality of images is less than a total number of the first plurality of images.
- the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: obtaining a plurality of encodings of a first plurality of images of an object; calculating a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; obtaining, based on the plurality of scores, a second plurality of images a second plurality of images based on scores for each image of the first plurality of images; and reconstructing, based on the second plurality of images, a three-dimensional (3D) model of the object.
- a method including: obtaining a plurality of encodings of a first plurality of images of an object; calculating a plurality of scores based on the plurality of encodings, wherein each score indicates an amount
- the techniques described herein relate to a system, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object includes: determining, based on the ACTIVE 708712445v1 7 Docket No. 212860-700001/PCT plurality of scores, a second plurality of viewpoints of the object; and rendering a second plurality of views of the first 3D model of the object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views.
- the techniques described herein relate to a system, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images.
- the techniques described herein relate to a three-dimensional modeling method including: obtaining, by one or more processors, a first plurality of encodings of a first plurality of images of an object; generating, by the one or more processors and one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, by the one or more processors based on the second plurality of encodings, a geometric form of a 3D model of the object.
- the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object.
- the techniques described herein relate to a method, wherein the first 3D model of the object is obtained from a vector database based on a description of the object.
- the techniques described herein relate to a method, further including: obtaining, from a vector database based on a description of the object, the first plurality of images of the object.
- the techniques described herein relate to a method, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image- generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data.
- the techniques described herein relate to a method, wherein the conditioning data include a description of the object and/or a description of one or more geometric attributes of the object.
- the techniques described herein relate to a method, wherein the generating the second plurality of embeddings further includes providing, by the one or more processors, the first plurality of embeddings as input to the one or more image-generating models.
- the techniques described herein relate to a method, wherein the one or more image-generating models include a latent, text-to-image diffusion model.
- the techniques described herein relate to a method, wherein the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model. [0052] In some aspects, the techniques described herein relate to a method, wherein the constructing the geometric form of the 3D model of the object includes: determining whether the geometric form of the 3D model of the object satisfies one or more criteria.
- the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method including: obtaining a first plurality of encodings of a first plurality of images of an object; generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, based on the second plurality of encodings, a geometric form of a 3D model of the object.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object.
- the techniques described herein relate to an at least one computer- readable storage medium, further including: obtaining, from a vector database based on a ACTIVE 708712445v1 9 Docket No. 212860-700001/PCT description of the object, the first 3D model of the object or the first plurality of images of the object.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein: the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model; and the constructing the geometric form of the 3D model of the object includes: determining whether the geometric form of the 3D model of the object satisfies one or more criteria.
- the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: obtaining a first plurality of encodings of a first plurality of images of an object; generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, based on the second plurality of encodings, a geometric form of a 3D model of the object.
- the techniques described herein relate to a system, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object.
- the techniques described herein relate to a system, further including: obtaining, from a vector database based on a description of the object, the first 3D model of the object or the first plurality of images of the object.
- the techniques described herein relate to a system, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a ACTIVE 708712445v1 10 Docket No. 212860-700001/PCT second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image- generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data.
- the techniques described herein relate to a system, wherein: the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model; and the constructing the geometric form of the 3D model of the object includes: determining whether the geometric form of the 3D model of the object satisfies one or more criteria.
- the techniques described herein relate to a three-dimensional modeling method including: obtaining, by one or more processors, a first encoding of a first image; generating, by the one or more processors and one or more image-generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping, by the one or more processors, the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model.
- the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view.
- the techniques described herein relate to a method, wherein the first 3D model of the object is obtained from a vector database based on a description of the object.
- the techniques described herein relate to a method, wherein the first image is obtained from a vector database based on a description of the texture and/or material.
- the techniques described herein relate to a method, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on the conditioning data.
- the conditioning data include a description of one or more visual attributes and/or optical attributes of the object.
- the techniques described herein relate to a method, wherein the description of the one or more visual attributes and/or optical attributes of the object includes a description of a texture, material, shading, lighting, reflectivity, and/or color of a surface of the object.
- the techniques described herein relate to a method, wherein the generating the second embedding includes providing, by the one or more processors, the first embedding as input to the one or more image-generating models.
- the techniques described herein relate to a method, wherein the one or more image-generating models include a latent, text-to-image diffusion model.
- the techniques described herein relate to a method, further includes: determining whether the texture and/or material of the surface of the 3D model satisfy one or more criteria.
- the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method including: obtaining a first encoding of a first image; generating, by one or more image-generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model.
- 3D three-dimensional
- the techniques described herein relate to an at least one computer- readable storage medium, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein the first image is obtained from a vector database based on a description of the texture and/or material.
- the techniques described herein relate to an at least one computer- readable storage medium, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on an embedding obtained from a vector database based on the conditioning data.
- the techniques described herein relate to an at least one computer- readable storage medium, further includes: determining whether the texture and/or material of the surface of the 3D model satisfy one or more criteria.
- the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: obtaining a first encoding of a first image; generating, by one or more image- generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model.
- a method including: obtaining a first encoding of a first image; generating, by one or more image- generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/
- the techniques described herein relate to a system, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view.
- the techniques described herein relate to a system, wherein the first image is obtained from a vector database based on a description of the texture and/or material.
- the techniques described herein relate to a system, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on an embedding obtained from a vector database based on the conditioning data.
- FIG.1A is a block diagram of an example system for constructing 3D models.
- FIG.1B is a block diagram of an example system for applying textures and/or materials to surfaces of 3D models.
- FIG.2 is a flowchart of an example method for constructing a 3D model of an object based on captured views of the object.
- FIG.3 is a flowchart of an example method for constructing a geometric form of a 3D model.
- FIG.4 is a flowchart of an example method for applying a texture and/or a material to a surface of a 3D model.
- FIG.5 is a flowchart of an example method for constructing a geometric form of a 3D model with textures and/or materials applied to surfaces of the 3D model.
- FIG.6 is a block diagram of an example computing device.
- FIG.7 shows some examples of workspace content.
- FIG.8 shows some additional examples of workspace content.
- FIG.9 shows block diagrams illustrating some examples of retrieval augmented construction (RAC) processes.
- FIG.10 shows examples of views of a 3D object, including a view synthesized using a photogrammetry process.
- FIGS.11A shows an example input view used during an example process of constructing a 3D model.
- FIGS.11B shows an example 3D model constructed based on the input view of FIG.11A during an example process of constructing a 3D model.
- FIGS.11C shows examples of candidate viewpoints from which example views of the 3D model of FIG.11B can be captured during an example process of constructing a 3D model.
- FIGS.11D shows an example 3D model constructed based on example views of the 3D model of FIG. 11C during an example process of constructing a 3D model.
- FIG.12 shows some examples of 3D models. ACTIVE 708712445v1 14 Docket No.
- FIG.13 shows some examples of 3D models illustrating the application of latent space conditioning during an example process of constructing a 3D model.
- FIG.14 shows a block diagram of an example tool-forming 3D pipeline.
- FIG.15 shows a block diagram of an example scalable AI infrastructure for 3D model generation.
- FIG.16 is a data flow diagram illustrating an example method for finding a 3D model that matches (e.g., most closely matches) a user query.
- FIG.17 is a block diagram of an example image-generating model.
- FIG.18 is a data flow diagram illustrating an example method for generating multi-view normal and color maps from single points of view.
- FIG.19 shows examples of 3D models of various objects.
- FIG.20A is a flowchart of an example process of generating a texture and applying the texture to a 3D model.
- FIG.20B is a flowchart of an example sub-process for normal maps.
- FIG.20C is a flowchart of an example sub-process for segmentation maps.
- FIG.20D is a flowchart of a UV map node arrangement process for unconventional materials.
- FIG.20E is a flowchart of another process of generating a texture.
- FIG.21 is a block diagram illustrating some operations performed by an example tool- forming pipeline while generating a 3D model of a car.
- FIG.22 shows example user interfaces of an example tool for styling and upscaling a 3D model.
- FIG.23A shows an example user interface of an example sampler tool.
- FIG.23B shows an example user interface of a 3D modeling tool.
- FIG.24A, 24B, and 24C show example sequences of inputs to an example 3D model generation tool and outputs from the tool during an example process of generating a 3D model of a sofa.
- FIG.24D, 24E, and 24F show example sequences of inputs to an example 3D model generation tool and outputs from the tool during an example process of generating a 3D model of a tiger head.
- FIG.24G, 24H, and 24I show example sequences of inputs to an example 3D model generation tool and outputs from the tool during an example process of generating a 3D model of a treasure chest.
- FIG.25A is a block diagram illustrating an example of a portion of a tool-forming process.
- FIG.25B is a block diagram of an example blender model that includes a stable diffusion model and a ControlNet model.
- FIGS.25C, 25D, and 25E show examples of similarity level tests.
- identical reference characters and descriptions indicate similar, but not necessarily identical, elements.
- AI artificial intelligence
- Described herein are various techniques for configuring and leveraging AI tools in a manner that infuses 3D object generation with an artistic, constructive workflow. Constructive workflows, such as those used by some embodiments described herein, represent a significant departure from conventional reconstruction-based methods of 3D model generation. Further, some methods described herein may allow for a specification of artistic choices in a manner other than numeric or textual input.
- Some embodiments may yield a more authentic and high- quality 3D object generation.
- Some embodiments described herein may leverage workspace conditioning techniques to produce high-quality 3D objects.
- the use of workspace conditioning may provide AI-based model-generating tools with an ability to interpret creator requests and generate corresponding goals and constraints, which may be used to guide the object creation process.
- a “tool-forming” process involves dynamically configuring and manipulating the tools and resources within a professional 3D object creation pipeline to achieve the desired outcomes.
- an expert system uses goals and constraints derived from the creator’s request to steer the tool-forming process, ensuring that the final 3D model produced by the pipeline meets exhibits the desired attributes.
- workspace conditioning may involve analyzing a creator’s (e.g., a user’s) artistic workspace, which may contain concept art, reference images, design documents, and other relevant materials.
- a 3D model can be generated that may more closely aligned with the creator’s creative vision.
- a model-generation workflow may leverage a multi-modal embedding space to facilitate AI-driven generation of models that align with a desired aesthetic.
- text, images/graphics, 3D models, textures, and other forms of content may be collectively mapped to locations in a multi-dimensional coordinate space that mathematically represents the attributes of the content.
- Content to which a conditioning model has been exposed in training such as images, 3D models, text, textures, and so on, and concepts reflected in the content, may each have a position in the embedding space. Similarities or differences between concepts may be quantified by the embedding space, based on geometric distances between coordinates. That embedding space may be used alone to retrieve content based of one or more formats (e.g., images, 3D models, etc.) based on a query (e.g., a text query).
- a query e.g., a text query
- an AI tool may leverage input content and the embedding space to define, in a latent embedding space, a manifold interconnecting different points in the embedding space that each are determined to relate to the input content.
- That manifold and the points it interconnects may be understood as a mathematical representation, determined by the AI tool, of various aspects of the desired aesthetic that is characterized or suggested by the various inputs.
- an AI tool may be able to objectively account for desired aesthetics even in a case that a user may not be able to verbalize or specify what those desired aesthetics are.
- an AI tool may be able to objectively quantify a match between a 3D model output using the techniques and one or more of the inputs provided, by identifying a geometric distance between a coordinate in the embedding space representing the output and the coordinate(s) representing the inputs.
- Such an objective measure of match may also be used in the 3D model generation process, such as through the model generation being constrained to create a representation that has a measure of match (e.g., a geometric distance in the embedding space) of no more than a set distance or other criterion.
- a measure of match e.g., a geometric distance in the embedding space
- some embodiments described herein may allow for specification of a desired visual aesthetic or other artistic choices through input of sample materials, which may represent a user artist’s prior work, or work similar to what is desired, etc.
- the samples may specify artistic style, shape, color, lighting, contrast, and the like, some or all of which may be used in generating an output 3D model.
- these systems and methods may involve and/or integrate new workflows and/or expert decision-making into some embodiments of AI tools that cause one or more computing devices to generate 3D models of objects.
- these systems and methods may facilitate, support, and/or provide one or more AI tools that efficiently generate more authentic, higher quality 3D graphical objects for use in video games, simulations, online environments, etc.
- 3D objects having aesthetic attributes aligned with the aesthetic attributes of the user- and/or project-specific artistic content are generated using an iterative process of image-based model construction (e.g., synthetic photogrammetry), view capture, and image generation via latent space conditioning.
- image-based model construction e.g., synthetic photogrammetry
- view capture e.g., a digital camera
- image generation via latent space conditioning.
- an initial model of an object may be generated.
- the initial model may be basic (e.g., may have generic aesthetic qualities for the type of object being generated) and/or incomplete (e.g., portions of the model may be ill-formed or noisy).
- Synthetic 2D images e.g., “views” of the model from various viewpoints may be obtained.
- These synthetic 2D images may be provided as input to an AI model, which may generate new images based on the input images.
- the AI model may infer the existence of attributes of the object that are unclear in the input images, and may add representations of those attributes to the new images.
- the embeddings representing user- and/or project-specific artistic content may be used to condition the process by which the AI model generates the new images, such that the aesthetic of the new images is more aligned with the aesthetic of the user- and/or project-specific artistic content.
- the new images generated by the AI model may “fill in” details that are unclear or inconsistent in the input images, and may alter the object’s aesthetic in the images.
- the AI model may be a diffusion model (e.g., a stable diffusion model).
- a new 3D model of the object may be generated based on the new images. This process may be repeated one or more times until a high-quality 3D model of the object aligned with the aesthetic of the user- and/or project-specific artistic content is generated. See, e.g., the description of “synthetic data generation,” “object creation,” and “latent space manipulation” in the section titled “Some Examples.” [0134] During the process of generating the 3D model, a novel technique may be used to identify and/or select the viewpoints from which the synthetic 2D images (or “views”) are obtained.
- the technique described herein may capture comprehensive views of an object using fewer viewpoints. Thus, using this technique may yield a smaller (e.g., minimal) set of views without sacrificing the quality of the photogrammetric process.
- Substantial computational resources may be used to process each 2D image during each iteration of the process of generating the 3D model.
- the viewpoint selection technique described herein can greatly enhance the speed and efficiency with which a high-quality 3D model of an ACTIVE 708712445v1 20 Docket No. 212860-700001/PCT object is generated.
- FIGS.11A-11D An example of the above-described iterative model-generation process is illustrated in FIGS.11A-11D, In this example, a high-quality 3D model of a castle is generated based on a single image of the castle (or even based on a text description of the castle, without any images of the castle) using an iterative process involving image-based 3D model construction (e.g., synthetic photogrammetry), view capture, and AI-based image generation conditioned by the embeddings of the contextual data provided by the user.
- image-based 3D model construction e.g., synthetic photogrammetry
- FIG.11A illustrates step (A) of the process.
- the user provides a prompt which includes a description of the castle to be modeled. That description may include text (e.g., “a medieval castle”), zero or more images (e.g., an overhead image of the castle), or other types of data. If no images are provided, the system can generate concept art (e.g., one or more conceptual views of the castle) and display that concept art to the user for approval, before proceeding to steps (B)-(D).
- FIG.11B illustrates step (B) of the process.
- step (B) the expert system controls a CAD tool (e.g., a 3D model reconstruction tool) to produce the initial construction of the castle based on the user’s description (or based on the system-generated concept art).
- a CAD tool e.g., a 3D model reconstruction tool
- FIG.11C illustrates step (C) of the process.
- the viewpoints e.g., optimal viewpoints
- 2D views of the model from those viewpoints are obtained through simulation.
- FIG.11D illustrates step (D) of the process.
- the synthetic 2D views of the model are provided as inputs to an AI model (e.g., stable diffusion model), which generates improved versions of those views that are aligned with the contextual data derived from the user’s content.
- an AI model e.g., stable diffusion model
- a modeling system 100 may be configured to generate 3D models of objects (e.g., virtual objects) using a constructive workflow.
- objects e.g., virtual objects
- modeling system 100 may iteratively construct 3D models 144 having different structures (e.g., different geometric forms) based on (1) a prompt (e.g., a text prompt) characterizing the modeled object and/or a desired aesthetic of the object, (2) conditioning data 127 objectively and mathematically representing the characteristics of the object and and/or the aesthetic, and/or (3) an existing representation of the object (e.g., a 3D model 144 of the object or a set of one or more views of the object).
- the modeling system 100 may include an image-generating model 120, a prompt facility 130, a 3D model generator 150, and a view capturer 152.
- the image-generating model 120 includes an autoencoder 122, a neural network 124, a vector database 128, and a conditioning model 126.
- the image-generating model 120 is configured to generate output images (e.g., output views 142 of an object) based on input images (e.g., input views 140 of the object) and on conditioning data 127.
- the conditioning data 127 may be provided, for example, by a conditioning model 126 based on conditioning information received from the prompt facility 130.
- the conditioning model encodes the conditioning information (e.g., generates an embedding of the conditioning information in the latent embedding space of the vector database 128), searches the vector database 128 for one or more embeddings similar to (e.g., most similar to) the embedding of the conditioning input, and provides those one or more embeddings to the neural network 124 of the image-generating model 120 as conditioning data 127.
- the image-generating model 120 may use the conditioning data 127 to condition a process by which the neural network 124 generates new embeddings representing the output views 142 based on the embeddings representing the input views 140.
- the image-generating model 120 is a diffusion model (e.g., a conditioned stable diffusion model).
- FIG.17 An example architecture of a conditioned stable diffusion model is illustrated in FIG.17.
- the image-generating model 120 is a blender model.
- An example architecture of a blender model including a stable diffusion model and a ControlNet model is illustrated in FIG.25B.
- the image-generating model 120 may modify and/or manipulate the internal representations (e.g., embeddings) of input images to generate new output images (a process referred to herein as “latent space manipulation”).
- AI-based system may access the latent space of the image-generating model 120 (e.g., a multi- dimensional space in which the image-generating model’s internal representations are stored). By doing so, the AI-based system may access the encoded features and/or characteristics (e.g., shapes, textures, colors, etc.) of the 3D graphical object. The AI-based system may then modify and/or manipulate such encoded features and/or characteristics until achieving a 3D graphical object that aligns with the creator’s goals and/or constraints. Such modifications and/or manipulation may be repeatedly and/or continuously performed based on feedback and/or performance metrics to refine the outcome of the 3D graphical object. [0144] Still referring to FIG.
- the modeling system 100 performs workspace analysis to populate a vector database 128 with embeddings representing conditioning content 110.
- the conditioning content 110 includes artistic content (e.g., 3D models, 2D images, concept art, videos, music, etc.) associated with a user and/or a project. Some non-limiting examples of conditioning content are shown in FIGS. 7-8.
- obtaining the vector database 128 of embeddings representing artistic content involves identifying such artistic content, creating embeddings representing the artistic content (e.g., using one or more encoders), and projecting those embeddings into a shared latent space.
- FIG.16 A non-limiting example of a process of building a vector database 128 of embeddings representing artistic content is shown in FIG.16.
- the embeddings 113 of the artistic content are created by a multi-modal indexer 112.
- the embeddings stored in the vector database 128 can be used to condition the generation of images by the image-generating model 120, such that the generated images are aesthetically consistent with the user- and/or project-specific artistic content. See, e.g., the description of “creative input” and “workspace analysis” in the section titled “Some Examples.”
- the system 100 may receive and/or obtain creative input related to a 3D graphical object.
- the creative input may originate from a human operating a computing device that executes and/or has access to the system 100. Additionally or alternatively, the creative input may originate from a local or remote computing device or system. Regardless of whether the creative input originates from a human or a computing device, the entity providing the creative input may be referred to herein as the creator. In certain implementations, the system 100 may be able to gain and/or form an understanding of the creator’s vision and/or requirements for the desired 3D graphical object based at least in part on ACTIVE 708712445v1 23 Docket No. 212860-700001/PCT the creative input.
- the system 100 may perform and/or execute a workspace analysis on the computing device that originates and/or provides the creative input.
- the system 100 may analyze and/or evaluate the contents of the computing device.
- the multi-modal indexer 112 may apply and/or perform multi-modal indexing on the files, documents, objects, code, images, repositories, directories, codebases, and/or models found on the computing device.
- the system 100 may analyze and/or evaluate such contents to gain insight into and/or understanding of the creator’s workspace structure, features, and/or relationships.
- insight and/or understanding may provide, serve as, and/or function as the foundation and/or basis for determining, identifying, and/or extrapolating the creator’s goals and/or constraints for the 3D graphical object.
- the system 100 may be able to tailor the goals and/or constraints for the 3D graphical object beyond the creative input based at least in part on such insight and/or understanding.
- the system 100 may translate and/or convert the creator’s goals and/or constraints (e.g., as indicated by a prompt provided to the image-generating model 120 by the prompt facility 130) into actionable parameters that guide the development of the 3D graphical object.
- the system 100 may rely on the vector database 128 to interpret the creator’s intentions based at least in part on the workspace analysis and the creative input.
- the system 100 may then generate a set of goals and/or parameters that define and/or outline requirements of the 3D graphical object and/or boundaries within which the 3D graphical object is developed based at least in part on the creator’s goals, constraints, and/or intentions.
- the vector database 128 may include and/or represent general or specific rules, industry standards, best practices, decision trees, behavior trees, and/or technical specifications related to 3D model generation.
- the system 100 may translate and/or convert the creator’s goals, constraints, and/or corresponding actionable parameters into a structured framework that guides part or all of the object creation process, thereby ensuring that the constructed 3D model complies with and/or satisfies the creator’s vision and/or expectations.
- the structured framework may provide, serve as, and/or function as the ACTIVE 708712445v1 24 Docket No.
- the system 100 may perform and/or execute extraction, transformation, and/or loading operations (collectively referred to herein as “deep ETL”) by delving into the intricacies of the 3D graphical objects.
- deep ETL may involve performing data extraction, transformation, and loading processes to gain insight into and/or understanding of the 3D graphical object’s structure and/or properties.
- deep ETL may entail extracting geometric details, textures, materials, and/or other attributes relevant to the creation of the desired 3D graphical object.
- deep ETL may further entail changing the formatting and/or normalizing certain values across such data. Additionally or alternatively, the system 100 may load the resulting deep ETL data into the vector database 128 to enrich its utility.
- FIG.16 A non-limiting examples of a process of a querying the vector database 128 of the conditioning model 126 of the image-generating model 120 is shown in FIG.16.
- FIG. 9. A non-limiting example of a process of performing retrieval augmented generation (RAG) with a conditioning model 126 is shown in FIG. 9.
- ROG retrieval augmented generation
- the modeling system 100 uses the prompt facility 130, image-generating model 120, 3D model generator 150, and view capturer 152 to construct a 3D model of the geometric form of an object.
- the prompt facility 130 can initiate a process of constructing a 3D model of an object by retrieving an initial model of the object or a set of views of the object from the vector database 128.
- the initial model of the object is a base 3D model (e.g., simple mesh). Some non-limiting examples of base 3D models are shown in FIG. 12.
- the prompt facility may provide a query to the image-generating model 120.
- the query may provide a description of an object and request an initial model and/or initial views of the object.
- the conditioning model 126 may encode the description of the object (e.g., may generate an embedding of the description in the latent embedding space of the vector database 128), search the vector database 128 for one or more embeddings similar to (e.g., most similar ACTIVE 708712445v1 25 Docket No. 212860-700001/PCT to) the embedding of the description, and provide a model and/or a set of images corresponding to those embeddings to the prompt facility 130.
- the prompt facility 130 may orchestrate an iterative modeling process in which images of an object (e.g., rendered images of a 3D model 144 of the object) are provided to the image-generating model 120 as input views 140, a prompt (e.g., a prompt including a query and conditioning information) is provided as input to the image-generating model 120 by the prompt facility 130, the image-generating model 120 generates new images (e.g., output views 142) of the object based on the input views 140 and the conditioning information, a 3D model generator 150 constructs a new 3D model 144 of the object based on the output views 142, a view capturer obtains more images of the object (e.g., rendered images of the new 3D model 144 of the object), etc.
- images of an object e.g., rendered images of a 3D model 144 of the object
- a prompt e.g., a prompt including a query and conditioning information
- the image-generating model 120 generates new images (e.g., output
- the 3D model generator 150 constructs a 3D model 144 from output views 142 using any suitable 3D model reconstruction techniques (e.g., any combination of multi-view reconstruction, depth estimation from stereo images, single image 3D reconstruction, volumetric reconstruction, neural rendering, synthetic photogrammetry, etc.).
- the technique used for constructing the 3D model of the object can vary depending on what types of views of the object are available. For example, when a multiple views are captured from different angles, multi-view reconstruction can be used, whereas when multiple images are captured from a similar angle (e.g., an overhead view) depth estimation from stereo images can be used.
- the view capturer generates images of an object from various viewpoints by rendering views of the 3D model 144 from those viewpoints.
- the view capturer 152 uses simulation and photogrammetry to render the views.
- the view capturer 152 performs an improved process for view generation as described in further detail herein (e.g., with reference to FIG.2). A non-limiting example of the operation of the view capturer 152 is shown in FIG.10.
- the system 100 may construct 3D models that implement realistic textures, materials, and/or other modeling components (e.g., using synthetic photogrammetry techniques). In these examples, the system 100 utilizes and/or relies on synthetic data to provide accurate, detailed representations of objects and/or surfaces.
- the system 100 may ACTIVE 708712445v1 26 Docket No. 212860-700001/PCT generate synthetic two-dimensional (2D) image data from 3D models and then use such 2D image data to train and/or condition the image-generating model 120. Additionally or alternatively, the system 100 may construct a 3D model from such 2D image data.
- the view capturer 152 performs an improved process of viewpoint selection for photogrammetry to provide comprehensive coverage of the object’s details from one or more informative perspectives (e.g., the most informative perspectives) using a limited (e.g., minimal) number of views.
- the view capturer 152 may analyze the object represented in the images from various viewpoints.
- the view capturer 152 may score those angles/views relative to one another based on their similarities and/or dissimilarities to identify the angles/views that provide a diverse and/or meaningful data set (e.g., the most diverse and/or meaningful data set) corresponding to the object represented in the images.
- Those selected viewpoints can then be used by the system 100 to construct a 3D model of the object.
- the system 100 performs object-construction operations by utilizing the authoring tool(s), settings, and/or artistic technique(s) selected in the tool-forming operation to create the 3D model.
- the system 100 models, textures, rigs, animates, and/or refines the 3D model via the authoring tool(s), settings, and/or artistic technique(s).
- the modeling process may involve creating a geometric form and/or shape helpful to achieve the proper aesthetic in view of the creator’s goals and/or constraints.
- a modeling system 160 may be configured to apply textures and/or materials to surfaces of 3D models using a constructive workflow.
- the modeling system 160 may iteratively construct textured 3D models 192 having different textures and/or materials applied to their surfaces based on (1) a prompt (e.g., a text prompt) characterizing the desired aesthetic of the textures and/or materials, (2) conditioning data 127 objectively and mathematically representing the characteristics of textures and/or materials having the aesthetic, and/or (3) an existing 3D model 180 of an object.
- the modeling system 160 may include an image-generating model 120, a prompt facility 130, a texture application tool 190, and a view capturer 152.
- the modeling system 160 uses the prompt facility 130, image- generating model 120, texture application tool 190, and view capturer 152 to construct a textured ACTIVE 708712445v1 27 Docket No. 212860-700001/PCT 3D model 192 of an object.
- the prompt facility 130 can initiate a process of texturing a 3D model of an object by retrieving a 3D model 180 of the object and at least one image representing a texture from the vector database 128.
- the 3D model 180 of the object is a base 3D model (e.g., simple mesh) or 3D model 144 generated by the system 100 of FIG.1A.
- the prompt facility may provide a query to the image-generating model 120.
- the query may provide a description of a texture and request an image representing the texture.
- the conditioning model 126 may encode the description of the texture (e.g., may generate an embedding of the description in the latent embedding space of the vector database 128), search the vector database 128 for one or more embeddings similar to (e.g., most similar to) the embedding of the description, and provide an image corresponding to the embedding to the prompt facility 130.
- the system 160 may then use the retrieved image as an initial version of input image 170 or output image 172 during the texturing process.
- the prompt facility 130 may orchestrate an iterative texturing process in which images of an object (e.g., rendered images of a textured 3D model 192 of the object) are provided to the image-generating model 120 as input images 170, a prompt (e.g., a prompt including a query and conditioning information) is provided as input to the image-generating model 120 by the prompt facility 130, the image-generating model 120 generates a new image (e.g., output image 172) of the object based on the input image 170 and the conditioning information, a texture application tool 190 applies the texture depicted in the output image 172 to a surface of a 3D model 180 to construct a textured 3D model 192, a view capturer 152 obtains another image of the object (e.g., a rendered image of the new textured 3D model 192 of the object), etc.
- images of an object e.g., rendered images of a textured 3D model 192 of the object
- a prompt e.g.,
- the texture application tool 190 applies a texture and/or material depicted in an output image 172 to at least one surface of a 3D model 180 using any suitable texturing techniques.
- texture-application techniques are shown in FIG.18 and FIGS.24A-I.
- texture-application methods optionally performed by the texture application tool 190 are shown in the flowcharts of FIGS.20A-20E.
- the texturing of a 3D model may involve applying the synthetic data, such as textures and/or materials, to the geometric form and/or shape to achieve the desired ACTIVE 708712445v1 28 Docket No. 212860-700001/PCT realistic appearance.
- the system 160 may map the textures onto the geometric form and/or shape of the object to provide the proper color, reflectivity, and/or surface texture.
- the functionality of modeling systems 100 and 160 can be combined in a joint system that generates 3D models of objects and applies textures / materials to the surfaces of those models using a combined constructive workflow.
- the 3D model generator 150 can implement the functionality of the texture application tool 190, such that the modeling system 100 constructs textured models.
- FIG.2 depicts a flowchart of an example method 200 for viewpoint selection for image- based construction (e.g., reconstruction) of a three-dimensional (3D) representation (e.g., model) of an object.
- the method includes steps 202-210.
- the method is performed by a view capturer (e.g., view capturer 152 of modeling system 100 or 160.
- the view capturer 152 obtains first images of the object.
- the first images include images of a physical object (e.g., images captured by a camera).
- the first images include images derived from a 3D representation of the object.
- the first images can include synthetically created images of the object, which can include rendered images of a 3D representation of the object.
- the first images depict views of the object from a set of viewpoints (e.g., angles or perspectives).
- the view capturer obtains encodings (e.g., embeddings) of the first images.
- the encodings or embeddings can be extracted or otherwise derived from the first images using any suitable techniques (e.g., using an encoder, which can be a pre-trained deep learning model).
- the encoder can be trained to generate feature representations (e.g., embeddings) from images, which can then be used for image retrieval, classification, clustering, similarity measurement, etc.
- Such encoders can be implemented using Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), etc.
- the encoder used to generate the encodings of the first images is similar or identical to the encoder of the image- generating model 120.
- the 3D representation can be obtained using any suitable techniques.
- a 3D model generator 150 generates the 3D representation of the object based on a set ACTIVE 708712445v1 29 Docket No. 212860-700001/PCT of images of the object.
- the set of images can be obtained using any suitable techniques.
- the object is a physical object and the set of images include camera-captured images of the physical object.
- the object is a virtual object and the set of images include images of the object provided by an image-generating model 120.
- the view capturer calculates scores (e.g., similarity scores) based on the encodings of the first images.
- the scores can be calculated using any technique that provides a score indicating an amount (e.g., degree) of similarity between or among a respective group of two or more of the first images.
- Techniques for calculating similarity between images can include comparing images based on their pixel values, features, or embeddings.
- the similarity score for two images are determined based on a measure of the similarity between the two images’ encodings (e.g., embeddings).
- the similar score for two images can be determined based on the cosine similarity, dot product similarity, or Euclidian distance between vector embeddings of the two images.
- the view capturer obtains, based on the scores, second images of the object.
- the second images include one or more of the first images (e.g., a selected subset of the first images).
- the subset of images can be selected based on their similarity or lack of similarity. For example, a subset of the first images exhibiting a high level of variation from one another (e.g., images having low similarity scores) can be selected.
- the number of second images is less than the number of first images.
- the second images include one or more images created (e.g., captured using a camera or rendered based on the 3D representation of the object) from new viewpoints not included within the viewpoints of the first images, for example, as shown in FIG.11C. The new viewpoints for the second images can be selected based on the similarity scores for the first images.
- a 3D model generator 150 generates (e.g., constructs) an updated 3D representation of the object, based on the second images, for example, as shown in FIG. 11D.
- a determination is made as to whether the updated 3D representation of the object is satisfactory. If so, the method 200 can end. Otherwise, the method can return to step 202 and repeat the following steps until a satisfactory 3D representation of the model is obtained.
- ACTIVE 708712445v1 30 Docket No. 212860-700001/PCT The determination of whether a 3D representation of the object is satisfactory can be made using any suitable technique.
- FIG.3 depicts a flowchart of an example method 300 for constructing (e.g., reconstructing) a geometric form of a 3D model of an object.
- the method includes steps 302-310.
- the method is performed by a modeling system 100.
- a first 3D model of the object is obtained.
- the first 3D model of the object is obtained from a vector database.
- the first 3D model of the object can be obtained from the database based on a query (e.g., user-provided query).
- the first 3D model can be retrieved based on at least one of a received description of the object, one or images of the object (or similar object), or a combination thereof.
- first encodings e.g., first embeddings
- the first images include images of a physical object (e.g., images captured by a camera).
- the first images include images derived from a 3D model of the object (e.g., a 3D model of the object constructed during a previous iteration of steps 304-310).
- the first images can include synthetically created images of the object, which can include rendered images of a 3D model of the object.
- the first images depict views of the object from a set of viewpoints.
- the first images of the object can be obtained from a database (e.g., vector database 128), provided by a view capturer 152, or obtained from any other suitable source.
- an encoder of the image-generating model 120 extracts or otherwise derives the first encodings from the first images.
- one or more of the first images and the encodings corresponding thereto are obtained from a vector database 128 based on a query.
- the query can include a description of the object, one or images of the object (or similar object), or a combination thereof.
- the image-generating model 120 generates second encodings of second images of the object.
- the second encodings may be based on the first encodings and on ACTIVE 708712445v1 31 Docket No. 212860-700001/PCT conditioning data.
- the conditioning data can include a description (e.g., text description) of the object and/or a description of one or more geometric attributes of the object.
- the second encodings are second embeddings
- the second embeddings are generated by conditioning an image generation process performed in a latent space of the image-generating model 120 on the first embeddings and/or on the conditioning data.
- generating the second embeddings can include providing the first embeddings as input to the image-generating model 120.
- the image-generating model 120 can include any combination of models for generating images from one or more embeddings.
- the image-generating model 120 can include a latent, text-to-image diffusion model.
- the 3D model generator 150 constructs, based on the second encodings, a geometric form of a 3D model (e.g., a new or updated 3D model) of the object, thereby defining one or more shapes, dimensions, and/or orientations of one or more portions (e.g., surfaces) of the 3D model.
- the system 100 determines whether the geometric form of the 3D model of the object satisfies one or more criteria. If so, the method 300 can end. Otherwise, the method can return to step 304 and repeat the following steps until a satisfactory geometric form is obtained. The system 100 can determine whether the geometric form of a 3D model of an object is satisfactory using any suitable technique.
- FIG.4 depicts a flowchart of an example method 400 for applying textures and/or materials to surfaces of a three-dimensional (3D) model.
- the method is performed by a modeling system 160.
- a first 3D model of the object is obtained.
- the first 3D model of the object is obtained from a vector database.
- the first 3D model of the object can be obtained from the database based on a query (e.g., user-provided query).
- the first 3D model can be retrieved based on at least one of a received description of the object, one or images of the object (or similar object), or a combination thereof.
- a first encoding e.g., first embedding
- the first image is an image of a physical object (e.g., an image captured by a camera).
- the first image is derived from a 3D model of the ACTIVE 708712445v1 32 Docket No. 212860-700001/PCT object (e.g., a 3D model of the object constructed during a previous iteration of steps 404-410.
- the first image is a rendered view or at least a portion of a view of a surface of the first 3D model of the object.
- the first image depicts a view of the object from a viewpoint.
- the first images of the object can be obtained from a database (e.g., vector database 128), provided by a view capturer 152, or obtained from any other suitable source.
- an encoder of the image-generating model 120 extracts or otherwise derives the first encoding from the first image.
- the first image and the first encoding corresponding thereto are obtained from a vector database 128 based on a query.
- the query can include a description of a texture and/or material.
- the image-generating model 120 generates a second encoding of a second image.
- the second encoding is based on the first encoding and conditioning data.
- the conditioning data can include any information that related to textures and/or materials.
- the conditioning data can include a description of one or more visual attributes and/or optical attributes of a surface, such as a description of a texture, material, shading, lighting, reflectivity, and/or color of the surface.
- the second encoding is a second embedding, and the second embedding is generated by conditioning an image generation process performed in a latent space of the image-generating model 120 on the first embedding and/or on the conditioning data.
- generating the second embedding can include providing the first embedding as input to the image-generating model 120.
- the image-generating model 120 can include any combination of models for generating images from one or more embeddings.
- the image- generating model 120 can include a latent, text-to-image diffusion model.
- the texture application tool 190 maps the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. After one or more textures and/or materials have been applied to one or more surfaces of the 3D model, a second 3D model of the object is effectively created.
- the system 160 determines whether the texture and/or material of the surface of the 3D model satisfy one or more criteria. If so, the method 400 can end. Otherwise, the method can return to step 404 and repeat the following steps until a satisfactory texture mapping is obtained.
- the system 160 can determine whether the texture and/or material of the surface of the 3D model are satisfactory using any suitable technique. For example, the determination can ACTIVE 708712445v1 33 Docket No. 212860-700001/PCT be based on a user input and/or on the system’s analysis of one or more attributes of the 3D model constructed at step 308.
- FIG.5 depicts a flowchart of an example method 500 for constructing (e.g., reconstructing) a 3D model of an object.
- the method includes steps 502-508.
- the method is performed by a modeling system.
- first encodings e.g., first embeddings
- the first images include images of a physical object (e.g., images captured by a camera).
- the first images include images derived from a 3D model of the object (e.g., a 3D model of the object constructed during a previous iteration of steps 502-508).
- the first images can include synthetically created images of the object, which can include rendered images of a 3D model of the object.
- the first images depict views of the object from a set of viewpoints.
- the first images of the object are provided by a view capturer 152.
- the view capturer 152 provides the first images of the object using the view generation method 200.
- the first images of the object can be obtained from a database (e.g., vector database 128) or from any other suitable source.
- an encoder of the image-generating model 120 extracts or otherwise derives the first encodings from the first images.
- one or more of the first images and the encodings corresponding thereto are obtained from a vector database 128 based on a query.
- the query can include a description of the object, one or images of the object (or similar object), or a combination thereof.
- the image-generating model 120 generates second encodings of second images of the object.
- the second encodings may be based on the first encodings and on conditioning data.
- the conditioning data can include a description (e.g., text description) of the object.
- the description of the object includes a description of one or more geometric attributes of the object, one or more visual attributes of the object, and/or one or more optical attributes of the object.
- the conditioning data include a description of one or more alterations to (1) an aesthetic of the object as depicted in the first images or represented in an existing 3D model of the object, (2) a geometric attribute of the object as depicted in the first images or represented in the existing 3D model of the object, (3) a visual attribute of the object as depicted in the first images or represented in the existing 3D model of ACTIVE 708712445v1 34 Docket No. 212860-700001/PCT the object, (4) an optical attribute of the object as depicted in the first images or represented in the existing 3D model of the object, etc.
- the second encodings are second embeddings
- the second embeddings are generated by conditioning an image generation process performed in a latent space of the image-generating model 120 on the first embeddings and/or on the conditioning data.
- generating the second embeddings can include providing the first embeddings as input to the image-generating model 120.
- the image-generating model 120 can include any combination of models for generating images from one or more embeddings.
- the image-generating model 120 can include a latent, text-to-image diffusion model.
- the 3D model generator 150 constructs, based on the second encodings, a 3D model (e.g., a new or updated 3D model) of the object, which indicates (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object.
- a 3D model e.g., a new or updated 3D model
- the system determines whether the 3D model of the object satisfies one or more criteria. If so, the method 500 can end. Otherwise, the method can return to step 502 and repeat the following steps until the 3D model of the object satisfies one or more criteria.
- the system 100 can determine whether the 3D model satisfies one or more criteria using any suitable technique.
- the determination can be based on a user input and/or on the system’s analysis of one or more attributes of the 3D model.
- the criteria include (1) receipt of user input indicating that the 3D model of the object is satisfactory, (2) receipt of user input requesting termination of modeling, (3) expiry of a maximum time period allocated for the modeling, and/or (4) use of a maximum amount of computational resources allocated for the modeling.
- a method for generating 3D objects using AI involves (1) obtaining a database of embeddings representing artistic content associated with a user and/or a project, (2) selecting and configuring CAD tools suitable for generating 3D objects consistent with that artistic content (a process sometimes referred to herein as “tool-forming”), and (3) generating 3D objects consisting with that artistic content via an iterative process of synthetic photogrammetry and latent space conditioning.
- agents e.g., expert systems
- agents can select CAD tools to be used in the object-generating process and configure those tools (e.g., set values of the tools’ parameters) to ACTIVE 708712445v1 35 Docket No. 212860-700001/PCT create 3D objects having aesthetic qualities that match or align with the aesthetic qualities of the user- and/or project-specific artistic content.
- agents also may control the operation of the CAD tools during the process of generating the 3D objects.
- an AI-based system may be designed to simulate, mimic, and/or imitate the behaviors and/or decision-making of an expert human graphical artist.
- the AI-based system may determine, identify, and/or extrapolate the goals and/or vision for a 3D graphical object by interpreting a request, prompt, and/or input that initiates the object- creation process in view of the contents of the workspace from which the request, prompt, and/or input originates. By doing so, the AI-based system may be able to effectuate and/or realize the goals and/or vision for the 3D graphical object with improved accuracy and/or precision.
- the AI-based system may perform and/or execute a tool-forming operation by selecting and/or identifying the appropriate authoring tool(s), settings, and/or artistic technique(s) based at least in part on the structured framework (e.g., goals, constraints, and/or corresponding actionable parameters).
- the AI-based system may select suitable authoring tool(s) from a range of options.
- such authoring tools may include and/or represent software modules that form, shape, and/or manipulate certain features of the 3D graphical object. Examples of such authoring tools include, without limitation, modeling modules, texturing modules, rigging modules, animation modules, combinations or variations of one or more of the same, and/or any other suitable authoring tools.
- the AI-based system may select suitable authoring tool(s) to ensure that the final object adheres to the creator’s goals and constraints.
- the AI-based system may create and/or generate synthetic data (e.g., textures, materials, modeling components, etc.) that facilitates enhancing the realism and/or accuracy of the resulting 3D graphical object.
- the AI-based system may condition and/or train AI models involved in the object-creation process based at least in part on the synthetic data. By doing so, the AI-based system may improve the AI-models’ ability to produce high-quality, realistic 3D graphical objects.
- the AI-based system may integrate the synthetic data into the tool-forming operation to enhance the realism and/or accuracy of the resulting 3D graphical object.
- the tool-forming operation may provide, serve as, and/or function as the ACTIVE 708712445v1 36 Docket No. 212860-700001/PCT foundation and/or basis for the actual creation of the 3D graphical object.
- the AI-based system may be able to model, texture, and/or animate the 3D graphical object to achieve a realistic, high-quality result and/or outcome.
- the synthetic data may enable the AI-based system to implement improved texture mapping, material application, and/or modeling during the object-creation process.
- the rigging of the 3D graphical object may involve creating and/or setting a skeleton structure within the underlying object to control the movement and/or animation.
- the animation of the 3D graphical object may involve causing the skeleton structure of the object to move, act, and/or behave in certain ways to satisfy the creator’s goals and/or constraints.
- the AI-based system may perform and/or execute an adversarial fine- tuning operation that reduces the need for human oversight and/or intervention.
- such adversarial fine-tuning may involve the AI-based system applying adversarial training techniques to improve the AI models’ abilities to create high-quality 3D graphical objects by updating and/or adjusting the models’ parameters and/or internal representations.
- the AI-based system may present the AI-models with challenging scenarios that force the models to learn more robust and/or accurate representations.
- the AI-based system may complete the object-creation process by preparing the 3D graphical object for integration into the creator’s project (e.g., a video game, simulation, etc.).
- the AI-based system may implement certain quality-assurance checks on the resulting 3D graphical object.
- the AI-based system may also optimize the 3D graphical object to perform satisfactorily in the creator’s target environment. Such optimization may involve reducing the polygon count, compressing textures, and/or adjusting the geometric form of the 3D graphical object. The AI-based system may then package the 3D graphical object for export to a format compatible with the creator’s project and/or target environment. [0208] In some examples, the AI-based system may implement any of a variety of different types of AI models and/or techniques to achieve the objectives described herein.
- AI models include, without limitation, machine learning models, convolutional neural networks, recurrent neural networks, supervised learning models, unsupervised learning models, linear regression models, logistic regression models, support vector machine models, Naive Bayes ACTIVE 708712445v1 37 Docket No. 212860-700001/PCT models, k-nearest neighbor models, k-means models, random forest models, combinations or variations of one or more of the same, and/or any other suitable power models.
- Computer-Based Implementations [0209] Techniques operating according to the principles described herein may be implemented in any suitable manner. Included in this disclosure is a description of the steps and acts of various processes that generate 3D objects.
- the processing and decision blocks of the flow charts above represent steps and acts that may be included in algorithms that carry out these various processes.
- Algorithms derived from these processes may be implemented as software integrated with and directing the operation of one or more single- or multi-purpose processors, may be implemented as functionally-equivalent circuits such as a Digital Signal Processing (DSP) circuit, Field Programmable Gate Array (FPGA), or an Application-Specific Integrated Circuit (ASIC), or may be implemented in any other suitable manner.
- DSP Digital Signal Processing
- FPGA Field Programmable Gate Array
- ASIC Application-Specific Integrated Circuit
- the flow charts illustrate the functional information one of ordinary skill in the art may use to fabricate circuits or to implement computer software algorithms to perform the processing of a particular apparatus carrying out the types of techniques described herein. It should also be appreciated that, unless otherwise indicated herein, the particular sequence of steps and/or acts described in each flow chart is merely illustrative of the algorithms that may be implemented and can be varied in implementations and embodiments of the principles described herein. [0210] Accordingly, in some embodiments, the techniques described herein may be embodied in computer-executable instructions implemented as software, including as application software, system software, firmware, middleware, embedded code, or any other suitable type of software.
- Such computer-executable instructions may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
- these computer-executable instructions may be implemented in any suitable manner, including as a number of functional facilities, each providing one or more operations to complete execution of ACTIVE 708712445v1 38 Docket No. 212860-700001/PCT algorithms operating according to these techniques.
- a “functional facility,” however instantiated, is a structural component of a computer system that, when integrated with and executed by one or more computers, causes the one or more computers to perform a specific operational role.
- a functional facility may be a portion of or an entire software element.
- a functional facility may be implemented as a function of a process, or as a discrete process, or as any other suitable unit of processing. If techniques described herein are implemented as multiple functional facilities, each functional facility may be implemented in its own way; all need not be implemented the same way. Additionally, these functional facilities may be executed in parallel and/or serially, as appropriate, and may pass information between one another using a shared memory on the computer(s) on which they are executing, using a message passing protocol, or in any other suitable way.
- functional facilities include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- the functionality of the functional facilities may be combined or distributed as desired in the systems in which they operate.
- one or more functional facilities carrying out techniques herein may together form a complete software package.
- These functional facilities may, in alternative embodiments, be adapted to interact with other, unrelated functional facilities and/or processes, to implement a software program application.
- Some exemplary functional facilities have been described herein for carrying out one or more tasks. It should be appreciated, though, that the functional facilities and division of tasks described is merely illustrative of the type of functional facilities that may implement the exemplary techniques described herein, and that embodiments are not limited to being implemented in any specific number, division, or type of functional facilities. In some implementations, all functionality may be implemented in a single functional facility.
- Computer-executable instructions implementing the techniques described herein may, in some embodiments, be encoded on one or more computer-readable media to provide functionality to the media.
- Computer-readable media include magnetic media such as a hard disk drive, optical ACTIVE 708712445v1 39 Docket No.
- 212860-700001/PCT media such as a Compact Disk (CD) or a Digital Versatile Disk (DVD), a persistent or non- persistent solid-state memory (e.g., Flash memory, Magnetic RAM, etc.), or any other suitable storage media.
- a computer-readable medium may be implemented in any suitable manner, including as computer-readable storage media 606 of FIG. 6 described below (i.e., as a portion of a computing device 600) or as a stand-alone, separate storage medium.
- “computer-readable media” also called “computer-readable storage media” refers to tangible storage media. Tangible storage media are non-transitory and have at least one physical, structural component.
- a “computer-readable medium,” as used herein, at least one physical, structural component has at least one physical property that may be altered in some way during a process of creating the medium with embedded information, a process of recording information thereon, or any other process of encoding the medium with information. For example, a magnetization state of a portion of a physical structure of a computer-readable medium may be altered during a recording process.
- some techniques described above comprise acts of storing information (e.g., data and/or instructions) in certain ways for use by these techniques. In some implementations of these techniques—such as implementations where the techniques are implemented as computer- executable instructions—the information may be encoded on a computer-readable storage media.
- these structures may be used to impart a physical organization of the information when encoded on the storage medium. These advantageous structures may then provide functionality to the storage medium by affecting operations of one or more processors interacting with the information; for example, by increasing the efficiency of computer operations performed by the processor(s).
- these instructions may be executed on one or more suitable computing device(s) operating in any suitable computer system, including the exemplary computer systems described herein, or one or more computing devices (or one or more processors of one or more computing devices) may be programmed to execute the computer- executable instructions.
- a computing device or processor may be programmed to execute instructions when the instructions are stored in a manner accessible to the computing device/processor, such as in a local memory (e.g., an on-chip cache or instruction register, a ACTIVE 708712445v1 40 Docket No. 212860-700001/PCT computer-readable storage medium accessible via a bus, a computer-readable storage medium accessible via one or more networks and accessible by the device/processor, etc.).
- a local memory e.g., an on-chip cache or instruction register, a ACTIVE 708712445v1 40 Docket No. 212860-700001/PCT computer-readable storage medium accessible via a bus, a computer-readable storage medium accessible via one or more networks and accessible by the device/processor, etc.
- FIG.6 illustrates one exemplary implementation of a computing device in the form of a computing device 600 that may be used in a system implementing the techniques described herein, although others are possible.
- Computing device 600 may comprise one or more processors 602, a network adapter 604, and computer-readable storage media 606.
- Computing device 600 may be, for example, a desktop or laptop personal computer, a personal digital assistant (PDA), a smart mobile phone, a server, a wireless access point or other networking element, or any other suitable computing device.
- Network adapter 604 may be any suitable hardware and/or software to enable the computing device 600 to communicate wired and/or wirelessly with any other suitable computing device over any suitable computing network.
- the computing network may include wireless access points, switches, routers, gateways, and/or other networking equipment as well as any suitable wired and/or wireless communication medium or media for exchanging data between two or more computers, including the Internet.
- Computer-readable storage media 606 may be adapted to store data to be processed and/or instructions to be executed by one or more processors 602.
- the one or more processors 602 enable processing of data and execution of instructions.
- the data and instructions may be stored on the computer-readable storage media 606.
- the data and instructions stored on computer-readable storage media 606 may comprise computer-executable instructions implementing techniques which operate according to the principles described herein.
- computer-readable storage media 606 ACTIVE 708712445v1 41 Docket No.
- Computer-readable storage media 606 may store an object generation facility 608 that implements one or more of the tools and/or methods described herein, and modeling data 610 comprising the objects, models, embeddings, and other data described herein.
- object generation facility 608 implements one or more of the tools and/or methods described herein
- modeling data 610 comprising the objects, models, embeddings, and other data described herein.
- a computing device may additionally have one or more components and peripherals, including input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output.
- Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets.
- a computing device may receive input information through speech recognition or in other audible format.
- the present section introduces some examples of an innovative approach to 3D object generation that integrates deep 3D object understanding and workspace conditioning to produce high-quality 3D objects.
- This approach may include an ability to interpret creator requests and generate corresponding goals and constraints, which guide the object creation process.
- Some examples of the tools described herein include an expert system that acts as a virtual technical artist. In some examples, this system possesses an extensive knowledge of professional 3D object creation workflows and techniques. Utilizing this expertise, the expert system can make informed decisions and execute actions that replicate those of a skilled human artist.
- the “tool-forming” process involves dynamically configuring and manipulating the tools and resources within a professional 3D object creation workspace to achieve the desired outcomes.
- the expert system uses the goals and constraints derived from the creator’s request to steer the tool-forming process, ensuring that the final 3D object meets the specified requirements.
- workspace conditioning involves analyzing the creator’s workspace, which may contain concept art, reference images, design documents, and other relevant materials. By incorporating this information, the system can further refine the goals and constraints, leading to 3D object that is not only of high quality but also closely aligned with the creator’s creative vision.
- the disclosed tools provide a comprehensive solution for 3D object generation that combines deep model understanding, creator-centric customization, and expert- guided creation.
- this solution results in a system capable of producing professional-grade 3D objects efficiently and effectively, catering to the needs of various industries, including gaming, film, and virtual or augmented reality.
- Terms The illustrative descriptions of the following terms are provided by way of example and are not limiting.
- Creator In some examples, a creator is an individual or entity that engages with AI tools through natural language requests and/or creative inputs for technical art assistance.
- Workspace In some examples, a workspace includes a collection of code, models, and other relevant materials used by the creator, optionally organized in a repository format common in game development and production projects.
- an assistant includes an AI agent that interprets the creator’s intentions and translates them into computational operations for technical art creation.
- Expert System In some examples, an expert system includes an inference engine that structures inputs, knowledge, processes, and outputs, providing a framework for AI tools to make informed decisions in the technical art domain.
- 3D Object In some examples, a 3D object includes a digital object, model, and/or asset used in 3D environments, such as games or simulations, which can include textures, materials, and animations.
- an authoring tool includes a software application or tool used for creating 3D objects, which is learned and operated by AI agents through the process of “tool-forming,” as guided by the expert system.
- Knowledge Base includes a vector database (e.g., neural semantic database) that stores structured knowledge and information used by an expert system or AI model to make decisions and guide the creation of 3D objects.
- 3D Object Embeddings include feature representations extracted from 3D objects using multi-modal models (e.g., encoders), which capture various aspects of the models for use in AI processes. ACTIVE 708712445v1 44 Docket No.
- RAG Retrieval Augmented Generation
- RAC Retrieval Augmented Construction
- RAC is a technique analogous to RAG, whereby conditioning data retrieved from a source (e.g., vector database) external to a generative model (e.g., image-generating model) is used to condition the process by which the generative model generates content (e.g., images) used to construct 3D models.
- Tool-forming 3D Pipeline includes a pipeline specifically designed to create tools configured (e.g., optimized) for use by AI in 3D content creation. It incorporates expert systems for symbolic reasoning, synthetic data generation, and specialized servers for tasks like texture synthesis and handling complex multi- object scenarios.
- Workspace Conditioning includes an AI training method that tailors an AI model (e.g., conditioning model) to a user’s specific work environment. Workspace conditioning can involve multi-modal indexing of the user’s workspace, including files, documents, and models, to create a detailed contextual understanding.
- Multi-Modal Workspace Indexing is a process of indexing various types of data in the user’s workspace, such as filesystem repositories, documents, and existing models, to provide a comprehensive view of the workspace for AI conditioning.
- Adversarial Fine-Tuning A technique used to refine AI models by reducing reliance on human oversight. It is part of the workspace conditioning process.
- Deep ETL Extract, Transform, Load
- Synthetic Photogrammetry Models The creation of synthetic models using photogrammetry techniques, which are then used for conditioning AI models.
- Latent Space A multi-dimensional space where the internal representations of one or more AI tools’ data are stored. In some examples, the latent space is used for conditioning an AI tool’s outputs by directly manipulating its internal representations.
- An Example Construction Process [0253] 1. Creative Input: The creator engages with the AI tools by providing natural language requests and creative inputs through the Assistant. These inputs can include descriptions, concept art, reference images, and design documents. [0254] 2. Workspace Analysis: The system analyzes the creator’s Workspace, which contains code, models, and other relevant materials. This step involves Multi-Modal Workspace Indexing to create a detailed contextual understanding of the creator’s environment.
- Goal and Constraint Generation Based on the creative input and workspace analysis, the system generates goals and constraints for the 3D object creation process. This step involves the Expert System, which uses its structured knowledge to interpret the creator’s intentions. [0256] 4. Tool-forming: The system selects and configures the appropriate Authoring Tools for the task at hand. This process, known as “tool-forming,” is guided by the Expert System, which determines the best tools and settings to use based on the goals and constraints. [0257] 5. Synthetic Data Generation: To enhance the realism and accuracy of the 3D object, the system generates synthetic data, such as textures and materials, using techniques like Synthetic Photogrammetry Models. This data is used to condition AI models and improve the quality of the final model. [0258] 6.
- Object creation With the tools configured and synthetic data generated, the system begins the object creation process. This involves using the selected Authoring Tools to model, texture, and animate the 3D object according to the specified goals and constraints.
- Adversarial Fine-Tuning Throughout the object creation process, the system employs Adversarial Fine-Tuning to refine the AI models and reduce reliance on human oversight. This step ensures that the generated model meets the high-quality standards required for production.
- Deep ETL Extra Text, Transform, Load
- the system uses a Reverse-Engineering ETL Model. This model helps in understanding the structure and properties of the models, which can be used for further refinement.
- Latent Space Manipulation The system fine-tunes the outputs of an AI tool by directly manipulating its internal representations in the Latent Space. This allows for precise adjustments to the 3D object, ensuring that it aligns with the creator’s vision and the technical requirements of the project.
- Production-Ready Model The final step involves generating a production-ready 3D object that meets the creator’s specifications and is suitable for use in games, simulations, or other 3D-graphics environments. This model is then delivered to the creator for integration into their project.
- the Creative Input step is the initial phase where the creator engages with the AI tools to provide the foundational inputs for the 3D object creation process. This engagement can occur through various channels, including natural language requests, creative inputs, and point-and- click 3D applications. The inputs provided by the creator may shape the direction and outcome of the 3D object creation process.
- Natural Language Requests Creators can communicate their requirements and specifications for the 3D object using natural language. This is facilitated by large language models that interpret the creator’s requests and translate them into structured instructions for some examples of an AI system. The use of natural language allows creators to express their ideas and visions in a flexible and intuitive manner.
- Creative Inputs In addition to textual descriptions, creators can provide a range of creative inputs to further define their vision for the 3D object. These inputs can include concept art, reference images, design documents, and even music and sound effects. These materials help to provide a richer context for the object creation process and enable some examples of an AI system to better understand the desired aesthetic and functional attributes of the final model.
- Point-and-Click 3D Applications For a more interactive engagement, creators can use point-and-click 3D applications to provide inputs. These applications allow creators to visually select and manipulate elements in a 3D environment, offering a more direct and intuitive way to convey their requirements for the model.
- the Creative Input step may help establish a clear and comprehensive understanding of the creator’s vision and requirements.
- creative ACTIVE 708712445v1 47 Docket No. 212860-700001/PCT inputs, and interactive 3D applications the AI tools can accurately interpret the creator’s intentions and set the stage for the subsequent steps in the object creation process.
- Workspace Analysis the AI tools systematically examine the creator’s Workspace, which encompasses the code, models, and other materials relevant to the 3D object creation project. This analysis may help the tools to understand the context and constraints within which the model is being developed. Some examples of workspace content are shown in Figs. 7 and 8.
- Workspace conditioning is an innovative AI training method that tailors AI models to a user’s specific work environment. It involves multi-modal indexing of the user's workspace, including files, documents, and objects, to create a detailed contextual understanding. Embeddings generated from this data are stored in a vector database, allowing for efficient neural searching. In some examples, AI models are further refined through adversarial fine- tuning, reducing reliance on human oversight. A reverse-engineering ETL model offers deep insights into game objects.
- Multi-Modal Workspace Indexing The system employs a technique known as Multi- Modal Workspace Indexing to create a comprehensive overview of the creator’s environment.
- Multi-modal workspace indexing may include indexing of the file system, repositories, documents, game concepts, existing objects, etc.
- indexing of the user- space may include indexing of cloud workspaces, uploaded owned objects, fine-tuning image datasets, bring-your-own-models, etc.
- indexing may provide deep understanding of objects in any suitable format (e.g., OBJ, FBX, GLTF, etc.).
- Contextual Understanding By analyzing the workspace, the system gains insights into the creator’s project, including the style, themes, and technical specifications that might influence the design and development of the 3D object. This contextual understanding may be helpful in tailoring the object creation process to fit seamlessly into the larger project.
- ACTIVE 708712445v1 48 Docket No. 212860-700001/PCT [0276] Integration with Creative Input: The information gleaned from the workspace analysis is integrated with the creative inputs provided by the creator in the previous step. This integration may ensure that the goals and constraints for the object creation are aligned with both the creator’s vision and the practical aspects of the project environment.
- the system translates the creator’s vision and project context into actionable parameters that guide the development of the model.
- Expert System Involvement The Expert System uses its structured knowledge base to interpret the creator’s intentions and the contextual information from the workspace analysis. The system then generates a set of goals that outline what the final model should achieve and constraints that define the boundaries within which the model is developed.
- Structured Knowledge Utilization The Expert System utilizes its structured knowledge to ensure that the goals and constraints are comprehensive and relevant. This knowledge includes industry standards, best practices, and technical specifications related to 3D object creation. By leveraging this knowledge, the system can generate goals and constraints that are realistic, achievable, and aligned with professional standards.
- the established goals and constraints serve as the foundation for the next step in the process, which is Tool-forming. They provide helpful guidance for selecting and configuring the appropriate authoring tools and techniques to create the 3D object.
- the Goal and Constraint Generation step may help translate the creator’s vision and project requirements into a structured framework that guides the entire object creation process. By clearly defining what the model should achieve and the parameters within which it should be developed, this step ensures that the final product is closely aligned with the creator’s expectations and project needs.
- Tool-forming [0288] In some examples of the Tool-forming step, the AI tools select and configure the appropriate authoring tools based on the goals and constraints established in the previous phase.
- Fig.25A shows an example illustration of a portion of a tool-forming process.
- Expert System Guidance The Expert System may determine the best tools and settings to use for the object creation process. It utilizes its structured knowledge base to match the specific goals and constraints with the capabilities of various authoring tools. Expert systems for symbolic reasoning may include decision trees and/or behavior trees. Such expert systems may help to solve the alignment problem.
- Selection of Authoring Tools Based on the requirements of the 3D object, the system selects suitable authoring tools from a range of options. These tools can include software for modeling, texturing, rigging, animation, and other aspects of 3D object creation.
- Tools for synthetic data generation may include Blender3D, UnrealEngine, etc.
- a Blender3D tool server may provide customized diffusion and procedural models, tools for texture synthesis (e.g., depths, normals, semseg, etc.), decision trees for multi-objects and/or multi-tasks (e.g., uv-mapping, texture painting, etc.).
- inputs and outputs to such models and tools may be provided in JSON format.
- the Blender3D tool server may provide support for JSON as an intermediate language.
- synthesis tools and techniques may include a photogrammetry synthesis viewer, synthetic photogrammetry objects generation tools, synthesized RGB-D + normal maps, a ControlNet-like workflow, NeRF conditioning tools, etc.
- Preparation for Object creation The tool-forming step prepares the groundwork for the actual creation of the 3D object. With the tools selected and configured, the system is now equipped to start modeling, texturing, and animating the model according to the specified goals and constraints.
- the Tool-forming step may ensure that the technical tools and resources are properly aligned with the creative and project requirements. By carefully selecting and configuring the authoring tools, the system sets the stage for the efficient and effective creation of the 3D object. [0297] Fig.
- the Tool- forming 3D pipeline is specifically designed to create tools that are configured (e.g., optimized) for use by AI in 3D content creation. It incorporates expert systems for symbolic reasoning, using decision and behavior trees to ensure AI decisions are informed and aligned with objectives. Synthetic data generation, facilitated by platforms like Blender3D and UnrealEngine, are used for conditioning AI models and achieving realistic 3D objects.
- the pipeline includes a specialized Blender3D tool server, equipped with a JSON intermediate language and customized ACTIVE 708712445v1 51 Docket No. 212860-700001/PCT models for tasks like texture synthesis and handling complex multi-object scenarios.
- the Photogrammetry tool server enhances surface reconstruction and enables novel view synthesis.
- a photogrammetry tool server may provide tools for customized surface reconstruction, procedural models, tools for novel view and shape synthesis, etc.
- Synthetic Data Generation the AI tools create synthetic data, such as textures and materials, to enhance the realism and accuracy of the 3D object. This process may help condition AI models and improve the quality of the final model.
- Use of Synthetic Photogrammetry Models The generation of Synthetic Photogrammetry Models involves creating realistic textures, materials, and other model components using photogrammetry techniques. These synthetic models are used to provide detailed and accurate representations of real-world objects and surfaces.
- Conditioning AI Models The synthetic data generated in this step is used to condition AI models involved in the object creation process. By training the AI models with this data, the system can improve their ability to produce high-quality, realistic 3D objects. The conditioning process helps the AI models learn from the synthetic data and apply this knowledge to the creation of the actual model.
- Integration with Tool-forming The synthetic data is integrated with the authoring tools selected and configured in the Tool-forming step. This integration allows the tools to utilize the synthetic data for tasks such as texture mapping, material application, and other aspects of object creation that require realistic data.
- Enhancement of Model Quality The use of synthetic data in the object creation process results in enhanced quality of the final 3D object.
- Preparation for Object creation The generation of synthetic data prepares the system for the actual creation of the 3D object in the next step. With the synthetic data ready, the system can proceed to model, texture, and animate the model with a focus on achieving the highest possible quality.
- ACTIVE 708712445v1 52 Docket No. 212860-700001/PCT [0305]
- the Synthetic Data Generation step may ensure that the AI models and authoring tools have access to high-quality, realistic data. This step may contribute to the overall quality and realism of the final 3D object.
- the Object creation step is where the actual development of the 3D object takes place. Utilizing the authoring tools selected and configured in the Tool-forming step, and incorporating the synthetic data generated in the Synthetic Data Generation step, the system begins to model, texture, and animate the 3D object according to the specified goals and constraints. [0308] Modeling: The first task in object creation is modeling, where the basic shape and structure of the 3D object are constructed. Using the chosen authoring tools, the system creates the geometric form of the model, ensuring that it aligns with the aesthetic and functional requirements outlined in the goals. [0309] Texturing: Once the model is complete, the next step is texturing.
- the synthetic textures and materials generated earlier are applied to the model to give it a realistic appearance.
- the system carefully maps these textures onto the model, paying attention to details such as color, reflectivity, and surface texture.
- Rigging and Animation If the 3D object requires animation, the system proceeds to rigging, where a skeleton structure is created to control the movement of the object. Following rigging, the 3D object is animated according to the specified goals, which might include specific actions, behaviors, or movements.
- Quality Assurance Throughout the object creation process, the system continuously checks the quality of the object to ensure that it meets the established standards.
- Object creation is often an iterative process, where the model undergoes multiple rounds of refinement based on feedback and evaluation. The system adjusts and improves to the model as appropriate to ensure that the final product meets the creator’s expectations and project requirements.
- the Object creation step is part of the AI object creation process, where the 3D object is brought to life through a combination of modeling, texturing, rigging, and animation. This step ACTIVE 708712445v1 53 Docket No.
- Adversarial Fine-Tuning may involve using the AI models in the creation of the 3D object.
- the AI models are refined to reduce reliance on human oversight and improve the quality of the final product.
- the use of adversarial fine-tuning may reduce or eliminate the need for human-in-the-loop activity.
- Refinement of AI Models In this step, the AI models involved in the object creation process undergo fine-tuning to enhance their performance.
- Adversarial Training This fine-tuning process may involve the use of adversarial training techniques. These techniques involve presenting the AI models with challenging scenarios or adversarial examples that force the models to learn more robust and accurate representations. This helps the models become more resilient and capable of handling a wider range of object creation tasks.
- Reduction of Human Oversight By fine-tuning the AI models, the system aims to reduce the need for human oversight in the object creation process. The goal is to create a more autonomous system that can produce high-quality models with minimal intervention from human creators or technicians.
- Deep ETL Extract, Transform, Load
- the Deep ETL (Extract, Transform, Load) step is where the system delves into the intricacies of the 3D objects being created. This step involves a thorough analysis of the data extraction, transformation, and loading processes to gain a comprehensive understanding of the models’ structure and properties.
- the use of ETL modeling may provide deep understanding of game objects.
- pre- and post-processing of objects may be provided.
- Extraction The system begins by extracting data from the 3D objects, which includes geometric details, textures, materials, and other pertinent attributes. This data is extracted in a structured manner to facilitate further analysis and processing.
- Transformation The extracted data undergoes transformation to enhance its utility and relevance for the object creation process. This may involve changing data formats, normalizing values, or applying other transformations to prepare the data for in-depth analysis and integration.
- Loading After transformation, the data is loaded into the system’s knowledge base or other storage solutions for subsequent use. The loaded data enriches the system’s structured knowledge, contributing to a more profound understanding of 3D objects and their characteristics.
- Insights and Refinement Leveraging the insights gained from the Deep ETL process, the system refines the object creation process. A more detailed understanding of the models’ structure and properties allows for informed decision-making and adjustments, resulting in enhanced quality and accuracy of the final models.
- Integration with Object creation The insights and data derived from the Deep ETL process are integrated into the ongoing object creation process. This integration enables real-time adjustments and improvements based on the comprehensive understanding of the models being created.
- the Deep ETL (Extract, Transform, Load) step may help attain and/or achieve a deeper understanding of the 3D objects and their components. By meticulously analyzing the data ACTIVE 708712445v1 55 Docket No.
- Latent Space Manipulation is where the system fine-tunes the outputs of an AI tool by directly manipulating its internal representations. This step may help ensure that the final 3D object aligns precisely with the creator’s vision and the technical requirements of the project.
- Accessing the Latent Space The system accesses the latent space of the AI models, which is a multi-dimensional space where the internal representations of data of one or more AI tools are stored. This space contains the encoded features and characteristics of the 3D objects being created.
- Direct Manipulation The system directly manipulates the representations in the latent space to adjust specific aspects of the 3D object. This can involve tweaking features related to the model’s shape, texture, color, or other attributes to achieve the desired outcomes.
- Precision and Control Latent space manipulation provides a high degree of precision and control over the object creation process. By making targeted adjustments in the latent space, the system can achieve subtle and precise modifications that are not easily attainable through traditional methods.
- Alignment with Goals and Constraints The manipulations in the latent space are guided by the goals and constraints established earlier in the process. This ensures that the adjustments are consistent with the creator’s requirements and the project’s technical specifications.
- the Production-Ready Model step is the culmination of the AI object creation process, where the final 3D object is prepared for integration into the creator’s project. This step ensures that the model meets all the specified requirements and is suitable for use in games, simulations, or other 3D environments.
- Final Quality Assurance Before declaring the model as production-ready, the system performs a final round of quality assurance checks. This includes verifying that the model adheres to the established goals and constraints, ensuring that it is technically sound, and confirming that it meets the desired aesthetic standards.
- Optimization for Performance The model undergoes optimization to ensure that it performs well in the intended environment.
- Packaging and Exporting The production-ready model is packaged and exported in a format that is compatible with the creator’s project. This includes organizing the model’s components, such as models, textures, and animations, into a cohesive package that can be easily integrated into the project’s workflow.
- Delivery to the Creator The final model is delivered to the creator, along with any helpful documentation or metadata. The creator can then incorporate the model into their project, where it can be used as intended in the game, simulation, or other 3D environment.
- Post-Delivery Support After the model is delivered, the system may provide post- delivery support to address any issues or adjustments that the creator may require.
- the Production-Ready Model step is the final phase in the AI object creation process, marking the transition of the 3D object from development to deployment. By ensuring that the object is of high quality, configured (e.g., optimized) for performance, and ready for integration, this step completes the journey from creative input to a tangible, production-ready object.
- Retrieval Augmented Construction (RAC) for Few-Shot 3D object Details [0348] RAC can be used to semantically retrieve 3D object details that closely match the user’s request.
- ACTIVE 708712445v1 57 Docket No. 212860-700001/PCT retrieval This process can involve texture and material semantic matching, rig and mesh matching, etc.
- Some examples of block diagrams illustrating retrieval augmented construction processes are shown in Fig.9.
- Texture and Material Semantic Matching RAC identifies textures and materials that are semantically related to the user’s request. For example, if the user specifies a “wooden” material, RAC retrieves textures that resemble wood grain patterns.
- Rig and Mesh Matching RAC finds rigs that are compatible with similar meshes based on the user’s specifications.
- Photogrammetry for Novel Views and Surfaces Photogrammetry is used to capture detailed images of objects from multiple angles. The camera’s intrinsic parameters (e.g., focal length, sensor size) and extrinsic parameters (e.g., position, orientation) are meticulously calibrated to ensure accurate reconstruction of 3D objects.
- Data Generation for NeRF and Gaussian Splatter The images captured through photogrammetry are processed to generate data understandable by neural radiance fields (NeRF) and Gaussian splatter techniques. This data is used to create novel views and surfaces with high levels of detail and realism.
- FIG.10 An example of a process of using photogrammetry to synthesize a target view of a 3D object is shown in Fig.10.
- FIG.10 An example of Improved Angle Selection for Photogrammetry
- AI tools select the best angles for capturing the 3D object based on image encoding scores and cosine similarity, functioning in a manner akin to Principal Component Analysis (PCA).
- PCA Principal Component Analysis
- Image Encoding Scores Each image captured from different angles is encoded using a neural network, producing a high-dimensional feature vector that represents the visual content of the image. These feature vectors are the image encoding scores, capturing key visual information of the object from various perspectives.
- Cosine Similarity Measurement In some examples, AI tools calculate the cosine similarity between the feature vectors of different images. Cosine similarity measures the cosine of the angle between two vectors, indicating their similarity in terms of visual content. A higher cosine similarity score suggests that the images capture similar features of the object.
- angles e.g., optimal angles
- distinct and informative perspectives of the object e.g., the most distinct and informative perspectives of the object.
- Capturing Images The camera is positioned at the selected angles to capture images of the object. These images are then used in the photogrammetry process to reconstruct the 3D object with high fidelity.
- image encoding scores and cosine similarity in a manner akin to PCA, our system ensures that the images used for photogrammetry are captured from angles that provide valuable information (e.g., the most valuable information) for accurate 3D reconstruction.
- Behavior trees can be used to structure the decision-making process of an expert system. Selectors can determine which action or sequence ACTIVE 708712445v1 59 Docket No. 212860-700001/PCT of actions to execute based on specific criteria.
- Actions are individual tasks performed by the system, such as selecting a tool or applying a texture. Composites combine multiple actions or selectors to form more complex behaviors.
- Decision Trees for Model Pipelines Decision trees are used to build model pipelines and tool-forming jobs. They help in deciding the sequence of steps required to create a 3D object, from initial modeling to final texturing and animation.
- Conditioning the Model with Latent Space Manipulation [0368] Pre-Processing, Transforming, and Post-Processing: The generative process can involve conditioning the model through various stages. Pre-processing prepares the input data, transforming involves manipulating the latent spaces to achieve desired outcomes, and post- processing refines the generated 3D object.
- Diffusion Models In some examples, diffusion models are used to generate high-quality 3D objects by gradually transforming a random noise distribution into a structured output. This process is guided by the conditioned latent spaces, facilitating alignment of the final model with the user’s specifications and creative vision. Some examples of results of conditioning a model with latent space manipulation are shown in Fig.13.
- Scalable AI infrastructure [0371] Fig. 15 is a block diagram of an example of a scalable AI infrastructure for 3D model generation. The AI Infrastructure is a sophisticated framework designed for advanced AI applications, particularly in 3D content creation and machine learning operations (MLOps).
- MLOps machine learning operations
- dockerized MLOps with GPU-accelerated images and headless UnrealEngine and Blender3D integration, ensuring efficient and automated 3D object generation processes.
- the infrastructure supports AI training and inference through both cloud and local nodes, specifically on consumer GPUs, offering scalable and flexible computing resources. It encompasses multi- modal databases that store embeddings, datasets, and models for various AI tasks. Additionally, a large synthetics database is included, providing extensive synthetic datasets and a massive collection of generic 3D base objects, essential for training AI models and conditioning content creation.
- dockerized MLOps may provide GPU-accelerated image processing, headless Blender3D, work queues, etc.
- AI training and inference nodes may include cloud and local nodes, consumer GPUs, etc.
- ACTIVE 708712445v1 60 Docket No. 212860-700001/PCT multi-modal databases may store embeddings, datasets, models, etc.
- a synthetics database may store large synthetic datasets and facilitate the provision of a massive generic 3D base objects server.
- 3D Generative Object Pipeline [0374] The AI has an ever growing base mesh database that it uses for fine tuning models and early conditioning in the generation process. Some examples of base meshes are shown in Fig. 12.
- the base meshes are neural stored with a unified multi-model embedding (text, audio, 3D, video, etc.), meaning all media encodes to the same intermediate representation. Together with a user query (or prompt) we find the nearest base objects required to kick off the generative pipeline.
- Fig.16 is a data flow diagram illustrating an example method for finding a base mesh that matches (e.g., most closely matches) a user query.
- an autonomous task management agent Given the base objects and the user request, an autonomous task management agent searches for the best Tool Agent for the job.
- the initial tool agent may be the Blender Agent.
- Each tool agent has a decision tree that helps it navigate the complexity of a 3D pipeline.
- advanced Conditioning is applied to one or more AI models.
- Fig.17 is a block diagram illustrating a system for conditioning a digital object (e.g., an image or 3D model).
- a digital object e.g., an image or 3D model.
- cross-domain diffusion models are used. Such models can generate multi-view normal and color maps from single points of view.
- Fig.18 is a dataflow diagram illustrating an example method for generating multi-view normal and color maps from single points of view. The agents can enter into an iterative back and forth between synthetic and generative generation until they converge to a final result.
- a blender model is used.
- Fig. 25B shows a configuration of a blender that includes a stable diffusion model and a ControlNet model.
- Fig. 19 shows illustrations of intermediate and final states of 3D models of various objects.
- the AI reconstructs a scene (similar to photogrammetry, 3D laser scan, etc.). Given the original control base objects and the virtual camera intrinsics and extrinsics used to generate the synthetic viewpoints, the system can perform state of the art 3D reconstruction, back on to professionally rigged objects.
- Textures and Materials Generation Pipeline ACTIVE 708712445v1 61 Docket No. 212860-700001/PCT [0381] In some examples, an expert system simulates the decision-making ability of a technical artist expert during a process of generating textures and materials. [0382] Figs.
- FIG. 20A-20C show a flowchart of a texture generation process.
- FIG. 20A shows a flowchart of a texture generation process 2000
- FIG. 20B shows a flowchart of a normal map sub-process 2040 of the texture generation process 2000
- FIG.20C shows a flowchart of a segmentation map sub-process 2020 of the texture generation process 2000.
- the texture generation process uses stable diffusion for the automatic generation of seamless and unique features conditioned by segmentation maps for single and/or multiple 3D objects. Additionally or alternatively, generation of features may be conditioned by normal maps and/or ambient occlusion textures.
- Fig. 20D shows a flowchart of a UV map node arrangement process for unconventional materials.
- Fig.20E shows a flowchart of another texture generation process.
- the use of an expert system in-the-loop goes beyond the traditional human-in-the-loop model used for training models like ChatGPT 4, integrating the expertise of look development professionals directly into the AI tool’s learning and decision-making processes.
- the AI models can gain an unprecedented depth of aesthetic and design intelligence, enabling them to handle complex visual tasks with a level of detail and accuracy unmatched in the industry.
- the expert system in- the-loop guides the AI to operate with an understanding of visual artistry and technical precision.
- an AI framework is structured as a computation graph, a network of interconnected nodes where each node represents a distinct computational operation or a step in an AI tool's learning process. This graph is intricately designed to integrate the insights from the expert system at every stage, ensuring that the AI tool's learning and decision-making are continually influenced by expert knowledge.
- Fig. 21 is a block diagram illustrating an example of some of the operations performed by a tool-forming 3D pipeline while generating a 3D model of a car.
- Fig. 22 shows examples of user interfaces of an AI tool for styling and upscaling a 3D model.
- the latent space of one or more AI tools is exposed.
- the latent space is a multi-dimensional space where an AI tool's internal representations of data are stored.
- ACTIVE 708712445v1 62 Docket No. 212860-700001/PCT making this space accessible at every node of the computation graph, we allow for an unprecedented level of transparency and control.
- This design enables both human experts and the LookDev-expert-system to fine-tune the AI tool's outputs by directly manipulating its internal representations.
- Fig. 23A shows an example of a user interface for a sampler tool.
- the latent space can be examined and adjusted.
- Fig. 23B shows an example of a user interface for a 3D modeling tool.
- Figs. 24A-24C show sequences of inputs to a 3D model generation tool and outputs from the tool during the generation and conditioning of a model of a sofa.
- Figs.24D-24F show sequences of inputs to a 3D model generation tool and outputs from the tool during the generation and conditioning of a model of a tiger head.
- Figs.24G-24I show sequences of inputs to a 3D model generation tool and outputs from the tool during the generation and conditioning of a model of a treasure chest.
- Fast Generative AI Tools [0397] Achieving high-performance game object generation in the system involves a combination of advanced techniques to increase efficiency and speed of operation and the use of efficient programming and processing tools. In some cases, such a process may include one or more of the following: ACTIVE 708712445v1 63 Docket No. 212860-700001/PCT [0398] Operation with Rust Programming Language: Rust is known for its high performance and safety, particularly in systems programming. By utilizing Rust, we ensure that the object generation pipeline is not only fast but also reliable and secure.
- GStreamer a powerful multimedia framework, is used for handling various media processing tasks. This tool is particularly effective for managing and manipulating audio and video files, which are integral parts of game objects. GStreamer's pipeline-based structure allows for high customization and optimization, making it an ideal choice for complex media operations required in game object generation.
- GPUs Graphics Processing Unit
- GPUs Graphics Processing Unit
- GPU Acceleration for Enhanced Performance Alongside porting tasks to the GPU, we also implement GPU acceleration techniques. This involves optimizing some examples of AI tools and processes to take full advantage of the GPU's architecture.
- the Blender and UnrealEngine headless servers also make full use of GPU- acceleration inside their containers.
- Embodiments have been described where the techniques are implemented in circuitry and/or computer-executable instructions. It should be appreciated that some embodiments may be in the form of a method, of which at least one example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be ACTIVE 708712445v1 64 Docket No. 212860-700001/PCT constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Health & Medical Sciences (AREA)
- Pure & Applied Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Processing Or Creating Images (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
A three-dimensional modeling method may include (a) obtaining a first plurality of encodings of a first plurality of images of an object; (b) generating, by the one or more processors and one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and (c) constructing, by the one or more processors based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object. Steps (a) – (c) of the modeling method may be repeated until the one or more processors determine that the 3D model of the object satisfies one or more criteria.
Description
Docket No. 212860-700001/PCT SYSTEMS AND METHODS FOR AI-ASSISTED CONSTRUCTION OF THREE- DIMENSIONAL MODELS CROSS-REFERENCE TO RELATED APPLICATION(S) [0001] This application claims the benefit of U.S. Provisional Application No.63/566,361, filed March 17, 2024, the entire contents of which are hereby incorporated by reference herein. BACKGROUND [0002] Computing devices may render three-dimensional (3D) objects (sometimes referred to herein as “3D graphical objects,” “3D digital objects,” “3D models,” or simply “objects” or “models”) for various purposes. For example, a computing device may render 3D graphical objects for use in video games, simulations, and/or online environments. Such 3D graphical objects may be generated in a variety of different ways. In one example, a computing device may generate a 3D graphical object based on input received from a human operator via a computer- aided design (CAD) tool. In another example, a computing device may generate a 3D graphical object based on images and/or scans of a corresponding tangible, real-world object. SUMMARY [0003] In some aspects, the techniques described herein relate to a three-dimensional modeling method including: (a) obtaining, by one or more processors, a first plurality of encodings of a first plurality of images of an object; (b) generating, by the one or more processors and one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, by the one or more processors based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) - (c) until the one or more processors determine that the 3D model of the object satisfies one or more criteria. [0004] In some aspects, the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a ACTIVE 708712445v1 1
Docket No. 212860-700001/PCT first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. [0005] In some aspects, the techniques described herein relate to a method, wherein the first 3D model of the object is obtained from a vector database based on a description of the object. [0006] In some aspects, the techniques described herein relate to a method, further including: obtaining, from a vector database based on a description of the object, the first plurality of images of the object. [0007] In some aspects, the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; rendering, from a set of viewpoints of the first 3D model of the object, a set of views of the object; calculating a plurality of scores based on encodings of views in the set of views, wherein the scores indicate an amount of similarity between or among groups of two or more views in the set of views; and obtaining, based on the plurality of scores, the first plurality of images of the object. [0008] In some aspects, the techniques described herein relate to a method, wherein obtaining the first plurality of images includes: selecting, based on the plurality of scores, a subset of the set of views, wherein the first plurality of images is the selected subset of the set of views. [0009] In some aspects, the techniques described herein relate to a method, wherein the set of views is a first set of views, the set of viewpoints is a first set of viewpoints, and obtaining the first plurality of images of the object includes: determining, based on the plurality of scores, a second set of viewpoints of the object; and rendering, by the one or more processors, a second set of views of the first 3D model of the object from the second set of viewpoints, wherein the first plurality of images is the second set of views. [0010] In some aspects, the techniques described herein relate to a method, wherein: the encodings of the views in the set of views are embeddings of the views in the set of views, and the amount of similarity between or among a respective group of two or more views is determined based on cosine similarity between the embeddings of respective views of the two or more views. [0011] In some aspects, the techniques described herein relate to a method, wherein a total number of images in the first plurality of images is less than a total number of views in the set of views. ACTIVE 708712445v1 2
Docket No. 212860-700001/PCT [0012] In some aspects, the techniques described herein relate to a method, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image- generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. [0013] In some aspects, the techniques described herein relate to a method, wherein the conditioning data include a description of the object. [0014] In some aspects, the techniques described herein relate to a method, wherein the description of the object includes a description of one or more geometric attributes of the object, one or more visual attributes of the object, and/or one or more optical attributes of the object. [0015] In some aspects, the techniques described herein relate to a method, wherein the conditioning data include a description of one or more alterations to (1) an aesthetic of the object as depicted in the first plurality of images or represented in the first 3D model of the object, (2) a geometric attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object, (3) a visual attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object, and/or (4) an optical attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object. [0016] In some aspects, the techniques described herein relate to a method, wherein the generating the second plurality of embeddings further includes providing, by the one or more processors, the first plurality of embeddings as input to the one or more image-generating models. [0017] In some aspects, the techniques described herein relate to a method, wherein the one or more image-generating models include a latent, text-to-image diffusion model. [0018] In some aspects, the techniques described herein relate to a method, wherein the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model. [0019] In some aspects, the techniques described herein relate to a method, wherein one or more visual attributes and/or optical attributes of the object includes a texture, material, shading, lighting, reflectivity, and/or color of a surface of the object. ACTIVE 708712445v1 3
Docket No. 212860-700001/PCT [0020] In some aspects, the techniques described herein relate to a method, wherein the one or more criteria include (1) receipt of user input indicating that the 3D model of the object is satisfactory, (2) receipt of user input requesting termination of modeling, (3) expiry of a maximum time period allocated for the modeling, and/or (4) use of a maximum amount of computational resources allocated for the modeling. [0021] In some aspects, the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to perform operations including: (a) obtaining a first plurality of encodings of a first plurality of images of an object; (b) generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) - (c) until the at least one processor determines that the 3D model of the object satisfies one or more criteria. [0022] In some aspects, the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: (a) obtaining a first plurality of encodings of a first plurality of images of an object; (b) generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) - (c) until the at least one processor determines that the 3D model of the object satisfies one or more criteria. [0023] In some aspects, the techniques described herein relate to a three-dimensional modeling method including: obtaining, by one or more processors, a plurality of encodings of a first plurality of images of an object; calculating, by the one or more processors, a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; ACTIVE 708712445v1 4
Docket No. 212860-700001/PCT obtaining, by the one or more processors and based on the plurality of scores, a second plurality of images; and reconstructing, by the one or more processors and based on the second plurality of images, a three-dimensional (3D) model of the object. [0024] In some aspects, the techniques described herein relate to a method, wherein the first plurality of images depict a plurality of views of the object, and wherein the object is a virtual 3D object or a physical 3D object. [0025] In some aspects, the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object includes: rendering, by the one or more processors, a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views. [0026] In some aspects, the techniques described herein relate to a method, wherein obtaining the second plurality of images includes: selecting, by the one or more processors and based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images. [0027] In some aspects, the techniques described herein relate to a method, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object includes: determining, by the one or more processors and based on the plurality of scores, a second plurality of viewpoints of the object; and rendering, by the one or more processors, a second plurality of views of the first 3D model of the object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views. [0028] In some aspects, the techniques described herein relate to a method, wherein the obtaining the first plurality of images of the object further includes: obtaining, by the one or more processors, the first 3D model of the object. [0029] In some aspects, the techniques described herein relate to a method, wherein obtaining the first 3D model of the object includes: generating, by the one or more processors and one or more image-generating models, one or more views of the object based on a description of the object, wherein the first 3D model of the object is reconstructed based on at least a subset of the one or more generated views. ACTIVE 708712445v1 5
Docket No. 212860-700001/PCT [0030] In some aspects, the techniques described herein relate to a method, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images. [0031] In some aspects, the techniques described herein relate to a method, wherein a total number of the second plurality of images is less than a total number of the first plurality of images. [0032] In some aspects, the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method including: obtaining a plurality of encodings of a first plurality of images of an object; calculating a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; obtaining, based on the plurality of scores, a second plurality of images a second plurality of images based on scores for each image of the first plurality of images; and reconstructing, based on the second plurality of images, a three-dimensional (3D) model of the object. [0033] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object includes: rendering a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views. [0034] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein obtaining the second plurality of images includes: selecting, based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images. [0035] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object includes: determining, based on the plurality of scores, a second plurality of viewpoints of the object; and rendering a second plurality of views of the first 3D model of the ACTIVE 708712445v1 6
Docket No. 212860-700001/PCT object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views. [0036] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images. [0037] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein a total number of the second plurality of images is less than a total number of the first plurality of images. [0038] In some aspects, the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: obtaining a plurality of encodings of a first plurality of images of an object; calculating a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; obtaining, based on the plurality of scores, a second plurality of images a second plurality of images based on scores for each image of the first plurality of images; and reconstructing, based on the second plurality of images, a three-dimensional (3D) model of the object. [0039] In some aspects, the techniques described herein relate to a system, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object includes: rendering a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views. [0040] In some aspects, the techniques described herein relate to a system, wherein obtaining the second plurality of images includes: selecting, based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images. [0041] In some aspects, the techniques described herein relate to a system, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object includes: determining, based on the ACTIVE 708712445v1 7
Docket No. 212860-700001/PCT plurality of scores, a second plurality of viewpoints of the object; and rendering a second plurality of views of the first 3D model of the object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views. [0042] In some aspects, the techniques described herein relate to a system, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images. [0043] In some aspects, the techniques described herein relate to a three-dimensional modeling method including: obtaining, by one or more processors, a first plurality of encodings of a first plurality of images of an object; generating, by the one or more processors and one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, by the one or more processors based on the second plurality of encodings, a geometric form of a 3D model of the object. [0044] In some aspects, the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. [0045] In some aspects, the techniques described herein relate to a method, wherein the first 3D model of the object is obtained from a vector database based on a description of the object. [0046] In some aspects, the techniques described herein relate to a method, further including: obtaining, from a vector database based on a description of the object, the first plurality of images of the object. [0047] In some aspects, the techniques described herein relate to a method, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image- generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. ACTIVE 708712445v1 8
Docket No. 212860-700001/PCT [0048] In some aspects, the techniques described herein relate to a method, wherein the conditioning data include a description of the object and/or a description of one or more geometric attributes of the object. [0049] In some aspects, the techniques described herein relate to a method, wherein the generating the second plurality of embeddings further includes providing, by the one or more processors, the first plurality of embeddings as input to the one or more image-generating models. [0050] In some aspects, the techniques described herein relate to a method, wherein the one or more image-generating models include a latent, text-to-image diffusion model. [0051] In some aspects, the techniques described herein relate to a method, wherein the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model. [0052] In some aspects, the techniques described herein relate to a method, wherein the constructing the geometric form of the 3D model of the object includes: determining whether the geometric form of the 3D model of the object satisfies one or more criteria. [0053] In some aspects, the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method including: obtaining a first plurality of encodings of a first plurality of images of an object; generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, based on the second plurality of encodings, a geometric form of a 3D model of the object. [0054] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. [0055] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, further including: obtaining, from a vector database based on a ACTIVE 708712445v1 9
Docket No. 212860-700001/PCT description of the object, the first 3D model of the object or the first plurality of images of the object. [0056] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. [0057] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein: the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model; and the constructing the geometric form of the 3D model of the object includes: determining whether the geometric form of the 3D model of the object satisfies one or more criteria. [0058] In some aspects, the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: obtaining a first plurality of encodings of a first plurality of images of an object; generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, based on the second plurality of encodings, a geometric form of a 3D model of the object. [0059] In some aspects, the techniques described herein relate to a system, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. [0060] In some aspects, the techniques described herein relate to a system, further including: obtaining, from a vector database based on a description of the object, the first 3D model of the object or the first plurality of images of the object. [0061] In some aspects, the techniques described herein relate to a system, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a ACTIVE 708712445v1 10
Docket No. 212860-700001/PCT second plurality of embeddings, and the generating the second plurality of embeddings includes: conditioning an image generation process performed in a latent space of the one or more image- generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. [0062] In some aspects, the techniques described herein relate to a system, wherein: the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model; and the constructing the geometric form of the 3D model of the object includes: determining whether the geometric form of the 3D model of the object satisfies one or more criteria. [0063] In some aspects, the techniques described herein relate to a three-dimensional modeling method including: obtaining, by one or more processors, a first encoding of a first image; generating, by the one or more processors and one or more image-generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping, by the one or more processors, the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. [0064] In some aspects, the techniques described herein relate to a method, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view. [0065] In some aspects, the techniques described herein relate to a method, wherein the first 3D model of the object is obtained from a vector database based on a description of the object. [0066] In some aspects, the techniques described herein relate to a method, wherein the first image is obtained from a vector database based on a description of the texture and/or material. [0067] In some aspects, the techniques described herein relate to a method, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on the conditioning data. ACTIVE 708712445v1 11
Docket No. 212860-700001/PCT [0068] In some aspects, the techniques described herein relate to a method, wherein the conditioning data include a description of one or more visual attributes and/or optical attributes of the object. [0069] In some aspects, the techniques described herein relate to a method, wherein the description of the one or more visual attributes and/or optical attributes of the object includes a description of a texture, material, shading, lighting, reflectivity, and/or color of a surface of the object. [0070] In some aspects, the techniques described herein relate to a method, wherein the generating the second embedding includes providing, by the one or more processors, the first embedding as input to the one or more image-generating models. [0071] In some aspects, the techniques described herein relate to a method, wherein the one or more image-generating models include a latent, text-to-image diffusion model. [0072] In some aspects, the techniques described herein relate to a method, further includes: determining whether the texture and/or material of the surface of the 3D model satisfy one or more criteria. [0073] In some aspects, the techniques described herein relate to at least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method including: obtaining a first encoding of a first image; generating, by one or more image-generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. [0074] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view. [0075] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein the first image is obtained from a vector database based on a description of the texture and/or material. ACTIVE 708712445v1 12
Docket No. 212860-700001/PCT [0076] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on an embedding obtained from a vector database based on the conditioning data. [0077] In some aspects, the techniques described herein relate to an at least one computer- readable storage medium, further includes: determining whether the texture and/or material of the surface of the 3D model satisfy one or more criteria. [0078] In some aspects, the techniques described herein relate to a system including: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method including: obtaining a first encoding of a first image; generating, by one or more image- generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. [0079] In some aspects, the techniques described herein relate to a system, wherein the 3D model of the object is a second 3D model of the object, the method further including: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view. [0080] In some aspects, the techniques described herein relate to a system, wherein the first image is obtained from a vector database based on a description of the texture and/or material. [0081] In some aspects, the techniques described herein relate to a system, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding includes: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on an embedding obtained from a vector database based on the conditioning data. ACTIVE 708712445v1 13
Docket No. 212860-700001/PCT BRIEF DESCRIPTION OF DRAWINGS [0082] The accompanying drawings illustrate a number of exemplary embodiments and are a part of the specification. Together with the following description, these drawings demonstrate and explain various principles of some embodiments. [0083] FIG.1A is a block diagram of an example system for constructing 3D models. [0084] FIG.1B is a block diagram of an example system for applying textures and/or materials to surfaces of 3D models. [0085] FIG.2 is a flowchart of an example method for constructing a 3D model of an object based on captured views of the object. [0086] FIG.3 is a flowchart of an example method for constructing a geometric form of a 3D model. [0087] FIG.4 is a flowchart of an example method for applying a texture and/or a material to a surface of a 3D model. [0088] FIG.5 is a flowchart of an example method for constructing a geometric form of a 3D model with textures and/or materials applied to surfaces of the 3D model. [0089] FIG.6 is a block diagram of an example computing device. [0090] FIG.7 shows some examples of workspace content. [0091] FIG.8 shows some additional examples of workspace content. [0092] FIG.9 shows block diagrams illustrating some examples of retrieval augmented construction (RAC) processes. [0093] FIG.10 shows examples of views of a 3D object, including a view synthesized using a photogrammetry process. [0094] FIGS.11A shows an example input view used during an example process of constructing a 3D model. [0095] FIGS.11B shows an example 3D model constructed based on the input view of FIG.11A during an example process of constructing a 3D model. [0096] FIGS.11C shows examples of candidate viewpoints from which example views of the 3D model of FIG.11B can be captured during an example process of constructing a 3D model. [0097] FIGS.11D shows an example 3D model constructed based on example views of the 3D model of FIG. 11C during an example process of constructing a 3D model. [0098] FIG.12 shows some examples of 3D models. ACTIVE 708712445v1 14
Docket No. 212860-700001/PCT [0099] FIG.13 shows some examples of 3D models illustrating the application of latent space conditioning during an example process of constructing a 3D model. [0100] FIG.14 shows a block diagram of an example tool-forming 3D pipeline. [0101] FIG.15 shows a block diagram of an example scalable AI infrastructure for 3D model generation. [0102] FIG.16 is a data flow diagram illustrating an example method for finding a 3D model that matches (e.g., most closely matches) a user query. [0103] FIG.17 is a block diagram of an example image-generating model. [0104] FIG.18 is a data flow diagram illustrating an example method for generating multi-view normal and color maps from single points of view. [0105] FIG.19 shows examples of 3D models of various objects. [0106] FIG.20A is a flowchart of an example process of generating a texture and applying the texture to a 3D model. [0107] FIG.20B is a flowchart of an example sub-process for normal maps. [0108] FIG.20C is a flowchart of an example sub-process for segmentation maps. [0109] FIG.20D is a flowchart of a UV map node arrangement process for unconventional materials. [0110] FIG.20E is a flowchart of another process of generating a texture. [0111] FIG.21 is a block diagram illustrating some operations performed by an example tool- forming pipeline while generating a 3D model of a car. [0112] FIG.22 shows example user interfaces of an example tool for styling and upscaling a 3D model. [0113] FIG.23A shows an example user interface of an example sampler tool. [0114] FIG.23B shows an example user interface of a 3D modeling tool. [0115] FIG.24A, 24B, and 24C show example sequences of inputs to an example 3D model generation tool and outputs from the tool during an example process of generating a 3D model of a sofa. [0116] FIG.24D, 24E, and 24F show example sequences of inputs to an example 3D model generation tool and outputs from the tool during an example process of generating a 3D model of a tiger head. ACTIVE 708712445v1 15
Docket No. 212860-700001/PCT [0117] FIG.24G, 24H, and 24I show example sequences of inputs to an example 3D model generation tool and outputs from the tool during an example process of generating a 3D model of a treasure chest. [0118] FIG.25A is a block diagram illustrating an example of a portion of a tool-forming process. [0119] FIG.25B is a block diagram of an example blender model that includes a stable diffusion model and a ControlNet model. [0120] FIGS.25C, 25D, and 25E show examples of similarity level tests. [0121] Throughout the drawings and specification, identical reference characters and descriptions indicate similar, but not necessarily identical, elements. While the exemplary embodiments described herein are susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example and will be described in detail herein. However, the exemplary embodiments described herein are not intended to be limited to the particular forms disclosed. Rather, the present disclosure covers all modifications, equivalents, alternatives, and/or subvariants of the descriptions provided herein. DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS Introduction [0122] When manually operated by an expert, some existing 3D modeling tools can produce high-quality 3D models of objects, but such modeling workflows tend to be very time- consuming and expensive. On the other hand, some existing 3D modeling tools can produce 3D models quickly and efficiently, but those models to be of low quality. The inventors have recognized and appreciated that the use of artificial intelligence (AI) techniques to integrate new workflows and/or expert-like decision-making into the modeling process can yield improved systems and methods for efficiently generating high-quality 3D models. [0123] Traditional approaches to 3D model generation have leveraged reconstruction techniques in which sensor data regarding a real three-dimensional object (such as a collection of images of the object, or depth data from a scan of the object, or other information) is analyzed to create a 3D model that accurately represents the real object. AI-driven tools have been created that also attempt to create mimics of real objects by performing reconstruction based on images or scans, where the reconstruction is implemented using AI technologies. ACTIVE 708712445v1 16
Docket No. 212860-700001/PCT [0124] As these techniques are aimed at creating accurate mimics of real-world objects, they do not allow for generating models of, for example, fanciful objects or other virtual objects that are not intended to be precise digital replicas of existing physical objects. Design and creation of 3D models for new objects - construction, as opposed to reconstruction - has traditionally focused on professional 3D object creation. [0125] Existing professional 3D object creation processes involve a complex pipeline, with each step in the pipeline requiring and leveraging specialized knowledge and skills acquired through years of experience in the field. The professionals engaged to construct the 3D models make artistic choices matching their own aesthetic or the desired aesthetics of project or person who commissioned the model. Various attributes of shape, style, color, contrast, brightness, and more are set by the artist in making the model from scratch and the choices are made at specific points in the pipeline of the artist’s creative process. These choices are subjective and vary from artist- to-artist and among different projects for the same artist. [0126] Attempts have been made at construction of 3D models using AI tools, but these techniques do not mimic the artistic 3D construction process. Instead, these tools leverage the mathematics of computational processes for objective creation of models in response to direct inputs. User-provided input data regarding shape, color, style and the like can be used to drive instantiation of a model based on that specific input. But this process requires the user be able to specify objectively and with specificity the precise nature of their desired aesthetic as it relates to individual attributes of the model. Users who lack the vocabulary to specify what they want, are unaware of their aesthetic preferences, or do not know what their preferences are or may be, are unable to use these tools to generate models having a desired aesthetic. By eschewing professional workflows, these attempts at an AI solution may produce 3D objects, but such objects lack the quality, detail, and authenticity characteristic of professionally created models. [0127] Described herein are various techniques for configuring and leveraging AI tools in a manner that infuses 3D object generation with an artistic, constructive workflow. Constructive workflows, such as those used by some embodiments described herein, represent a significant departure from conventional reconstruction-based methods of 3D model generation. Further, some methods described herein may allow for a specification of artistic choices in a manner other than numeric or textual input. Some embodiments may yield a more authentic and high- quality 3D object generation. ACTIVE 708712445v1 17
Docket No. 212860-700001/PCT [0128] Some embodiments described herein may leverage workspace conditioning techniques to produce high-quality 3D objects. The use of workspace conditioning may provide AI-based model-generating tools with an ability to interpret creator requests and generate corresponding goals and constraints, which may be used to guide the object creation process. In some examples, a “tool-forming” process involves dynamically configuring and manipulating the tools and resources within a professional 3D object creation pipeline to achieve the desired outcomes. In some examples, an expert system uses goals and constraints derived from the creator’s request to steer the tool-forming process, ensuring that the final 3D model produced by the pipeline meets exhibits the desired attributes. More particularly, in some examples described herein, workspace conditioning may involve analyzing a creator’s (e.g., a user’s) artistic workspace, which may contain concept art, reference images, design documents, and other relevant materials. By incorporating this information into an AI-driven 3D model generation process using techniques described herein, a 3D model can be generated that may more closely aligned with the creator’s creative vision. [0129] In some embodiments described in greater detail below, a model-generation workflow may leverage a multi-modal embedding space to facilitate AI-driven generation of models that align with a desired aesthetic. With such a multi-modal embedding space, text, images/graphics, 3D models, textures, and other forms of content may be collectively mapped to locations in a multi-dimensional coordinate space that mathematically represents the attributes of the content. Content to which a conditioning model has been exposed in training, such as images, 3D models, text, textures, and so on, and concepts reflected in the content, may each have a position in the embedding space. Similarities or differences between concepts may be quantified by the embedding space, based on geometric distances between coordinates. That embedding space may be used alone to retrieve content based of one or more formats (e.g., images, 3D models, etc.) based on a query (e.g., a text query). Advantages may be obtained by conditioning a model generation process using the retrieved content. For example, a representation of an image having the desired aesthetic characteristic may be retrieved based on a query characterizing or suggesting the desired aesthetic characteristics, and the representation of the image may be used to condition the process of generating a model, such that the generated model exhibits the desired aesthetic characteristics. ACTIVE 708712445v1 18
Docket No. 212860-700001/PCT [0130] In some embodiments described herein, an AI tool may leverage input content and the embedding space to define, in a latent embedding space, a manifold interconnecting different points in the embedding space that each are determined to relate to the input content. That manifold and the points it interconnects may be understood as a mathematical representation, determined by the AI tool, of various aspects of the desired aesthetic that is characterized or suggested by the various inputs. By leveraging this manifold and its embedding space interconnections in the model generation process, an AI tool may be able to objectively account for desired aesthetics even in a case that a user may not be able to verbalize or specify what those desired aesthetics are. In some such embodiments, an AI tool may be able to objectively quantify a match between a 3D model output using the techniques and one or more of the inputs provided, by identifying a geometric distance between a coordinate in the embedding space representing the output and the coordinate(s) representing the inputs. Such an objective measure of match may also be used in the 3D model generation process, such as through the model generation being constrained to create a representation that has a measure of match (e.g., a geometric distance in the embedding space) of no more than a set distance or other criterion. [0131] Through techniques such as these, some embodiments described herein may allow for specification of a desired visual aesthetic or other artistic choices through input of sample materials, which may represent a user artist’s prior work, or work similar to what is desired, etc. The samples may specify artistic style, shape, color, lighting, contrast, and the like, some or all of which may be used in generating an output 3D model. Through this streamlined workflow for specifying aesthetic choice and analyzing and representing aesthetics within the AI, some embodiments may enable generation of 3D models of higher quality and better fidelity to desired artistic characteristics. [0132] Described herein are some examples of implementations of systems and methods for generating 3D models using artificial intelligence (AI). For example, these systems and methods may involve and/or integrate new workflows and/or expert decision-making into some embodiments of AI tools that cause one or more computing devices to generate 3D models of objects. By doing so, these systems and methods may facilitate, support, and/or provide one or more AI tools that efficiently generate more authentic, higher quality 3D graphical objects for use in video games, simulations, online environments, etc. ACTIVE 708712445v1 19
Docket No. 212860-700001/PCT [0133] In some examples, 3D objects having aesthetic attributes aligned with the aesthetic attributes of the user- and/or project-specific artistic content (“contextual data”) are generated using an iterative process of image-based model construction (e.g., synthetic photogrammetry), view capture, and image generation via latent space conditioning. During this process, an initial model of an object may be generated. In some cases, the initial model may be basic (e.g., may have generic aesthetic qualities for the type of object being generated) and/or incomplete (e.g., portions of the model may be ill-formed or noisy). Synthetic 2D images (e.g., “views”) of the model from various viewpoints may be obtained. These synthetic 2D images may be provided as input to an AI model, which may generate new images based on the input images. The AI model may infer the existence of attributes of the object that are unclear in the input images, and may add representations of those attributes to the new images. Furthermore, the embeddings representing user- and/or project-specific artistic content may be used to condition the process by which the AI model generates the new images, such that the aesthetic of the new images is more aligned with the aesthetic of the user- and/or project-specific artistic content. In other words, the new images generated by the AI model may “fill in” details that are unclear or inconsistent in the input images, and may alter the object’s aesthetic in the images. In some examples, the AI model may be a diffusion model (e.g., a stable diffusion model). Using image based 3D model construction techniques (e.g., synthetic photogrammetry), a new 3D model of the object may be generated based on the new images. This process may be repeated one or more times until a high-quality 3D model of the object aligned with the aesthetic of the user- and/or project-specific artistic content is generated. See, e.g., the description of “synthetic data generation,” “object creation,” and “latent space manipulation” in the section titled “Some Examples.” [0134] During the process of generating the 3D model, a novel technique may be used to identify and/or select the viewpoints from which the synthetic 2D images (or “views”) are obtained. Relative to other techniques for selecting viewpoints, the technique described herein may capture comprehensive views of an object using fewer viewpoints. Thus, using this technique may yield a smaller (e.g., minimal) set of views without sacrificing the quality of the photogrammetric process. Substantial computational resources may be used to process each 2D image during each iteration of the process of generating the 3D model. Thus, by reducing (e.g., minimizing) the number of 2D images used during each iteration, the viewpoint selection technique described herein can greatly enhance the speed and efficiency with which a high-quality 3D model of an ACTIVE 708712445v1 20
Docket No. 212860-700001/PCT object is generated. See, e.g., the descriptions of view capturer 152 and view generation method 200 below, and the description of “Improved Angle Selection for Photogrammetry” in the section titled “Some Examples.” [0135] An example of the above-described iterative model-generation process is illustrated in FIGS.11A-11D, In this example, a high-quality 3D model of a castle is generated based on a single image of the castle (or even based on a text description of the castle, without any images of the castle) using an iterative process involving image-based 3D model construction (e.g., synthetic photogrammetry), view capture, and AI-based image generation conditioned by the embeddings of the contextual data provided by the user. [0136] FIG.11A illustrates step (A) of the process. In step (A), the user provides a prompt which includes a description of the castle to be modeled. That description may include text (e.g., “a medieval castle”), zero or more images (e.g., an overhead image of the castle), or other types of data. If no images are provided, the system can generate concept art (e.g., one or more conceptual views of the castle) and display that concept art to the user for approval, before proceeding to steps (B)-(D). [0137] FIG.11B illustrates step (B) of the process. In step (B), the expert system controls a CAD tool (e.g., a 3D model reconstruction tool) to produce the initial construction of the castle based on the user’s description (or based on the system-generated concept art). [0138] FIG.11C illustrates step (C) of the process. In step (C), the viewpoints (e.g., optimal viewpoints) for view capture are identified, and 2D views of the model from those viewpoints are obtained through simulation. [0139] FIG.11D illustrates step (D) of the process. In step (D), the synthetic 2D views of the model are provided as inputs to an AI model (e.g., stable diffusion model), which generates improved versions of those views that are aligned with the contextual data derived from the user’s content. Then the improved versions of those views are used to construct a new 3D model. If the new model is adequate, the iterative process ends. If the new model is not adequate, the process returns to step (C) for another iteration. Some Examples of Modeling Systems and Methods [0140] Referring to FIG. 1A, a modeling system 100 may be configured to generate 3D models of objects (e.g., virtual objects) using a constructive workflow. In some embodiments, the ACTIVE 708712445v1 21
Docket No. 212860-700001/PCT modeling system 100 may iteratively construct 3D models 144 having different structures (e.g., different geometric forms) based on (1) a prompt (e.g., a text prompt) characterizing the modeled object and/or a desired aesthetic of the object, (2) conditioning data 127 objectively and mathematically representing the characteristics of the object and and/or the aesthetic, and/or (3) an existing representation of the object (e.g., a 3D model 144 of the object or a set of one or more views of the object). The modeling system 100 may include an image-generating model 120, a prompt facility 130, a 3D model generator 150, and a view capturer 152. Some embodiments of these components and processes jointly performed by these components are described in further detail below. [0141] In some examples, the image-generating model 120 includes an autoencoder 122, a neural network 124, a vector database 128, and a conditioning model 126. In some examples, the image-generating model 120 is configured to generate output images (e.g., output views 142 of an object) based on input images (e.g., input views 140 of the object) and on conditioning data 127. The conditioning data 127 may be provided, for example, by a conditioning model 126 based on conditioning information received from the prompt facility 130. In some examples, the conditioning model encodes the conditioning information (e.g., generates an embedding of the conditioning information in the latent embedding space of the vector database 128), searches the vector database 128 for one or more embeddings similar to (e.g., most similar to) the embedding of the conditioning input, and provides those one or more embeddings to the neural network 124 of the image-generating model 120 as conditioning data 127. The image-generating model 120 may use the conditioning data 127 to condition a process by which the neural network 124 generates new embeddings representing the output views 142 based on the embeddings representing the input views 140. [0142] In some examples, the image-generating model 120 is a diffusion model (e.g., a conditioned stable diffusion model). An example architecture of a conditioned stable diffusion model is illustrated in FIG.17. In some examples, the image-generating model 120 is a blender model. An example architecture of a blender model including a stable diffusion model and a ControlNet model is illustrated in FIG.25B. [0143] Conceptually, in some examples, the image-generating model 120 may modify and/or manipulate the internal representations (e.g., embeddings) of input images to generate new output images (a process referred to herein as “latent space manipulation”). In one example, the ACTIVE 708712445v1 22
Docket No. 212860-700001/PCT AI-based system may access the latent space of the image-generating model 120 (e.g., a multi- dimensional space in which the image-generating model’s internal representations are stored). By doing so, the AI-based system may access the encoded features and/or characteristics (e.g., shapes, textures, colors, etc.) of the 3D graphical object. The AI-based system may then modify and/or manipulate such encoded features and/or characteristics until achieving a 3D graphical object that aligns with the creator’s goals and/or constraints. Such modifications and/or manipulation may be repeatedly and/or continuously performed based on feedback and/or performance metrics to refine the outcome of the 3D graphical object. [0144] Still referring to FIG. 1A, in some examples the modeling system 100 performs workspace analysis to populate a vector database 128 with embeddings representing conditioning content 110. In some examples, the conditioning content 110 includes artistic content (e.g., 3D models, 2D images, concept art, videos, music, etc.) associated with a user and/or a project. Some non-limiting examples of conditioning content are shown in FIGS. 7-8. [0145] In some examples, obtaining the vector database 128 of embeddings representing artistic content involves identifying such artistic content, creating embeddings representing the artistic content (e.g., using one or more encoders), and projecting those embeddings into a shared latent space. A non-limiting example of a process of building a vector database 128 of embeddings representing artistic content is shown in FIG.16. In some examples, the embeddings 113 of the artistic content are created by a multi-modal indexer 112. [0146] The embeddings stored in the vector database 128 can be used to condition the generation of images by the image-generating model 120, such that the generated images are aesthetically consistent with the user- and/or project-specific artistic content. See, e.g., the description of “creative input” and “workspace analysis” in the section titled “Some Examples.” [0147] In some examples, the system 100 may receive and/or obtain creative input related to a 3D graphical object. For example, the creative input may originate from a human operating a computing device that executes and/or has access to the system 100. Additionally or alternatively, the creative input may originate from a local or remote computing device or system. Regardless of whether the creative input originates from a human or a computing device, the entity providing the creative input may be referred to herein as the creator. In certain implementations, the system 100 may be able to gain and/or form an understanding of the creator’s vision and/or requirements for the desired 3D graphical object based at least in part on ACTIVE 708712445v1 23
Docket No. 212860-700001/PCT the creative input. Examples of such creative input include, without limitation, natural language descriptions of desired 3D graphical objects, natural language translations of human inputs, concept art depictions, reference images, photographs, design documents, point-and-click inputs, combinations or variations of one or more of the same, and/or any other suitable creative input. [0148] In some examples, the system 100 may perform and/or execute a workspace analysis on the computing device that originates and/or provides the creative input. In one example, the system 100 may analyze and/or evaluate the contents of the computing device. For example, the multi-modal indexer 112 may apply and/or perform multi-modal indexing on the files, documents, objects, code, images, repositories, directories, codebases, and/or models found on the computing device. In this example, the system 100 may analyze and/or evaluate such contents to gain insight into and/or understanding of the creator’s workspace structure, features, and/or relationships. In certain implementations, such insight and/or understanding may provide, serve as, and/or function as the foundation and/or basis for determining, identifying, and/or extrapolating the creator’s goals and/or constraints for the 3D graphical object. In other words, the system 100 may be able to tailor the goals and/or constraints for the 3D graphical object beyond the creative input based at least in part on such insight and/or understanding. [0149] In some examples, the system 100 may translate and/or convert the creator’s goals and/or constraints (e.g., as indicated by a prompt provided to the image-generating model 120 by the prompt facility 130) into actionable parameters that guide the development of the 3D graphical object. In one example, the system 100 may rely on the vector database 128 to interpret the creator’s intentions based at least in part on the workspace analysis and the creative input. In this example, the system 100 may then generate a set of goals and/or parameters that define and/or outline requirements of the 3D graphical object and/or boundaries within which the 3D graphical object is developed based at least in part on the creator’s goals, constraints, and/or intentions. [0150] In some examples, the vector database 128 may include and/or represent general or specific rules, industry standards, best practices, decision trees, behavior trees, and/or technical specifications related to 3D model generation. In one example, the system 100 may translate and/or convert the creator’s goals, constraints, and/or corresponding actionable parameters into a structured framework that guides part or all of the object creation process, thereby ensuring that the constructed 3D model complies with and/or satisfies the creator’s vision and/or expectations. In this example, the structured framework may provide, serve as, and/or function as the ACTIVE 708712445v1 24
Docket No. 212860-700001/PCT foundation and/or basis for selecting and/or identifying suitable authoring tool(s) and/or artistic technique(s) needed to create the 3D graphical object. [0151] In some examples, the system 100 may perform and/or execute extraction, transformation, and/or loading operations (collectively referred to herein as “deep ETL”) by delving into the intricacies of the 3D graphical objects. In one example, deep ETL may involve performing data extraction, transformation, and loading processes to gain insight into and/or understanding of the 3D graphical object’s structure and/or properties. For example, deep ETL may entail extracting geometric details, textures, materials, and/or other attributes relevant to the creation of the desired 3D graphical object. In this example, deep ETL may further entail changing the formatting and/or normalizing certain values across such data. Additionally or alternatively, the system 100 may load the resulting deep ETL data into the vector database 128 to enrich its utility. [0152] A non-limiting examples of a process of a querying the vector database 128 of the conditioning model 126 of the image-generating model 120 is shown in FIG.16. A non-limiting example of a process of performing retrieval augmented generation (RAG) with a conditioning model 126 is shown in FIG. 9. In FIG.19, some non-limiting examples of base models 1902, models 1904 conditioned with a first aesthetic, and models 1906 conditioned with a second aesthetic are shown. [0153] Still referring to FIG.1A, in some examples the modeling system 100 uses the prompt facility 130, image-generating model 120, 3D model generator 150, and view capturer 152 to construct a 3D model of the geometric form of an object. Some examples of aspects and embodiments of the image-generating model 120 are described herein. [0154] In some examples, the prompt facility 130 can initiate a process of constructing a 3D model of an object by retrieving an initial model of the object or a set of views of the object from the vector database 128. In some examples, the initial model of the object is a base 3D model (e.g., simple mesh). Some non-limiting examples of base 3D models are shown in FIG. 12. In some examples, the prompt facility may provide a query to the image-generating model 120. The query may provide a description of an object and request an initial model and/or initial views of the object. The conditioning model 126 may encode the description of the object (e.g., may generate an embedding of the description in the latent embedding space of the vector database 128), search the vector database 128 for one or more embeddings similar to (e.g., most similar ACTIVE 708712445v1 25
Docket No. 212860-700001/PCT to) the embedding of the description, and provide a model and/or a set of images corresponding to those embeddings to the prompt facility 130. The system 100 may then use the retrieved model as initial version of 3D model 144 or as an initial set of input views 140 during the modeling process. [0155] In some examples, the prompt facility 130 may orchestrate an iterative modeling process in which images of an object (e.g., rendered images of a 3D model 144 of the object) are provided to the image-generating model 120 as input views 140, a prompt (e.g., a prompt including a query and conditioning information) is provided as input to the image-generating model 120 by the prompt facility 130, the image-generating model 120 generates new images (e.g., output views 142) of the object based on the input views 140 and the conditioning information, a 3D model generator 150 constructs a new 3D model 144 of the object based on the output views 142, a view capturer obtains more images of the object (e.g., rendered images of the new 3D model 144 of the object), etc. [0156] In some examples, the 3D model generator 150 constructs a 3D model 144 from output views 142 using any suitable 3D model reconstruction techniques (e.g., any combination of multi-view reconstruction, depth estimation from stereo images, single image 3D reconstruction, volumetric reconstruction, neural rendering, synthetic photogrammetry, etc.). The technique used for constructing the 3D model of the object can vary depending on what types of views of the object are available. For example, when a multiple views are captured from different angles, multi-view reconstruction can be used, whereas when multiple images are captured from a similar angle (e.g., an overhead view) depth estimation from stereo images can be used. [0157] In some examples, the view capturer generates images of an object from various viewpoints by rendering views of the 3D model 144 from those viewpoints. In some examples, the view capturer 152 uses simulation and photogrammetry to render the views. In some examples, the view capturer 152 performs an improved process for view generation as described in further detail herein (e.g., with reference to FIG.2). A non-limiting example of the operation of the view capturer 152 is shown in FIG.10. [0158] In some examples, the system 100 may construct 3D models that implement realistic textures, materials, and/or other modeling components (e.g., using synthetic photogrammetry techniques). In these examples, the system 100 utilizes and/or relies on synthetic data to provide accurate, detailed representations of objects and/or surfaces. In one example, the system 100 may ACTIVE 708712445v1 26
Docket No. 212860-700001/PCT generate synthetic two-dimensional (2D) image data from 3D models and then use such 2D image data to train and/or condition the image-generating model 120. Additionally or alternatively, the system 100 may construct a 3D model from such 2D image data. [0159] In some examples, the view capturer 152 performs an improved process of viewpoint selection for photogrammetry to provide comprehensive coverage of the object’s details from one or more informative perspectives (e.g., the most informative perspectives) using a limited (e.g., minimal) number of views. For example, based on one or more images (e.g., captured by cameras or rendered from a 3D model), the view capturer 152 may analyze the object represented in the images from various viewpoints. In this example, the view capturer 152 may score those angles/views relative to one another based on their similarities and/or dissimilarities to identify the angles/views that provide a diverse and/or meaningful data set (e.g., the most diverse and/or meaningful data set) corresponding to the object represented in the images. Those selected viewpoints can then be used by the system 100 to construct a 3D model of the object. [0160] In some examples, the system 100 performs object-construction operations by utilizing the authoring tool(s), settings, and/or artistic technique(s) selected in the tool-forming operation to create the 3D model. In one example, the system 100 models, textures, rigs, animates, and/or refines the 3D model via the authoring tool(s), settings, and/or artistic technique(s). In this example, the modeling process may involve creating a geometric form and/or shape helpful to achieve the proper aesthetic in view of the creator’s goals and/or constraints. [0161] Referring to FIG.1B, a modeling system 160 may be configured to apply textures and/or materials to surfaces of 3D models using a constructive workflow. In some embodiments, the modeling system 160 may iteratively construct textured 3D models 192 having different textures and/or materials applied to their surfaces based on (1) a prompt (e.g., a text prompt) characterizing the desired aesthetic of the textures and/or materials, (2) conditioning data 127 objectively and mathematically representing the characteristics of textures and/or materials having the aesthetic, and/or (3) an existing 3D model 180 of an object. The modeling system 160 may include an image-generating model 120, a prompt facility 130, a texture application tool 190, and a view capturer 152. Some embodiments of these components and processes jointly performed by these components are described in further detail below. [0162] In some examples the modeling system 160 uses the prompt facility 130, image- generating model 120, texture application tool 190, and view capturer 152 to construct a textured ACTIVE 708712445v1 27
Docket No. 212860-700001/PCT 3D model 192 of an object. In some examples, the prompt facility 130 can initiate a process of texturing a 3D model of an object by retrieving a 3D model 180 of the object and at least one image representing a texture from the vector database 128. In some examples, the 3D model 180 of the object is a base 3D model (e.g., simple mesh) or 3D model 144 generated by the system 100 of FIG.1A. [0163] In some examples, the prompt facility may provide a query to the image-generating model 120. The query may provide a description of a texture and request an image representing the texture. The conditioning model 126 may encode the description of the texture (e.g., may generate an embedding of the description in the latent embedding space of the vector database 128), search the vector database 128 for one or more embeddings similar to (e.g., most similar to) the embedding of the description, and provide an image corresponding to the embedding to the prompt facility 130. The system 160 may then use the retrieved image as an initial version of input image 170 or output image 172 during the texturing process. [0164] In some examples, the prompt facility 130 may orchestrate an iterative texturing process in which images of an object (e.g., rendered images of a textured 3D model 192 of the object) are provided to the image-generating model 120 as input images 170, a prompt (e.g., a prompt including a query and conditioning information) is provided as input to the image-generating model 120 by the prompt facility 130, the image-generating model 120 generates a new image (e.g., output image 172) of the object based on the input image 170 and the conditioning information, a texture application tool 190 applies the texture depicted in the output image 172 to a surface of a 3D model 180 to construct a textured 3D model 192, a view capturer 152 obtains another image of the object (e.g., a rendered image of the new textured 3D model 192 of the object), etc. [0165] In some examples, the texture application tool 190 applies a texture and/or material depicted in an output image 172 to at least one surface of a 3D model 180 using any suitable texturing techniques. Some non-limiting examples of texture-application techniques are shown in FIG.18 and FIGS.24A-I. Some non-limiting examples texture-application methods optionally performed by the texture application tool 190 are shown in the flowcharts of FIGS.20A-20E. Additionally or alternatively, the texturing of a 3D model may involve applying the synthetic data, such as textures and/or materials, to the geometric form and/or shape to achieve the desired ACTIVE 708712445v1 28
Docket No. 212860-700001/PCT realistic appearance. The system 160 may map the textures onto the geometric form and/or shape of the object to provide the proper color, reflectivity, and/or surface texture. [0166] In some examples, the functionality of modeling systems 100 and 160 can be combined in a joint system that generates 3D models of objects and applies textures / materials to the surfaces of those models using a combined constructive workflow. Conceptually, the 3D model generator 150 can implement the functionality of the texture application tool 190, such that the modeling system 100 constructs textured models. [0167] FIG.2 depicts a flowchart of an example method 200 for viewpoint selection for image- based construction (e.g., reconstruction) of a three-dimensional (3D) representation (e.g., model) of an object. In some examples, the method includes steps 202-210. In some examples, the method is performed by a view capturer (e.g., view capturer 152 of modeling system 100 or 160. [0168] At step 202, the view capturer 152 obtains first images of the object. In some examples, the first images include images of a physical object (e.g., images captured by a camera). In some examples, the first images include images derived from a 3D representation of the object. For example, the first images can include synthetically created images of the object, which can include rendered images of a 3D representation of the object. In some examples, the first images depict views of the object from a set of viewpoints (e.g., angles or perspectives). The location and number of viewpoints for the first images can depend on attributes of the object (e.g., the geometric form of the object). [0169] In some examples, the view capturer obtains encodings (e.g., embeddings) of the first images. The encodings or embeddings can be extracted or otherwise derived from the first images using any suitable techniques (e.g., using an encoder, which can be a pre-trained deep learning model). The encoder can be trained to generate feature representations (e.g., embeddings) from images, which can then be used for image retrieval, classification, clustering, similarity measurement, etc. Such encoders can be implemented using Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), etc. In some examples, the encoder used to generate the encodings of the first images is similar or identical to the encoder of the image- generating model 120. [0170] For scenarios in which the first images include images derived from a 3D representation of the object, the 3D representation can be obtained using any suitable techniques. In some examples, a 3D model generator 150 generates the 3D representation of the object based on a set ACTIVE 708712445v1 29
Docket No. 212860-700001/PCT of images of the object. The set of images can be obtained using any suitable techniques. In some examples, the object is a physical object and the set of images include camera-captured images of the physical object. In some examples, the object is a virtual object and the set of images include images of the object provided by an image-generating model 120. [0171] At step 204, the view capturer calculates scores (e.g., similarity scores) based on the encodings of the first images. The scores can be calculated using any technique that provides a score indicating an amount (e.g., degree) of similarity between or among a respective group of two or more of the first images. Techniques for calculating similarity between images can include comparing images based on their pixel values, features, or embeddings. In some examples, the similarity score for two images are determined based on a measure of the similarity between the two images’ encodings (e.g., embeddings). For example, the similar score for two images can be determined based on the cosine similarity, dot product similarity, or Euclidian distance between vector embeddings of the two images. [0172] At step 206, the view capturer obtains, based on the scores, second images of the object. In some examples, the second images include one or more of the first images (e.g., a selected subset of the first images). In some examples, the subset of images can be selected based on their similarity or lack of similarity. For example, a subset of the first images exhibiting a high level of variation from one another (e.g., images having low similarity scores) can be selected. Selecting images that have greater variation (e.g., less similarity) can be beneficial, because a model generated from such a set of images can be more complete and/or higher quality. In some embodiments, the number of second images is less than the number of first images. [0173] In some examples, the second images include one or more images created (e.g., captured using a camera or rendered based on the 3D representation of the object) from new viewpoints not included within the viewpoints of the first images, for example, as shown in FIG.11C. The new viewpoints for the second images can be selected based on the similarity scores for the first images. [0174] At step 208, a 3D model generator 150 generates (e.g., constructs) an updated 3D representation of the object, based on the second images, for example, as shown in FIG. 11D. [0175] At step 210, a determination is made as to whether the updated 3D representation of the object is satisfactory. If so, the method 200 can end. Otherwise, the method can return to step 202 and repeat the following steps until a satisfactory 3D representation of the model is obtained. ACTIVE 708712445v1 30
Docket No. 212860-700001/PCT The determination of whether a 3D representation of the object is satisfactory can be made using any suitable technique. For example, the determination can be based on a user input or it can be automatically determined by analyzing one or more attributes of the updated 3D representation. [0176] As would be appreciated by one skilled in the art, during a second or subsequent iteration of the steps 202-210, the first images, second images, and 3D representation of the object can differ from the first images, second images, and 3D representation of the object used in prior iterations of the steps 202-210. [0177] FIG.3 depicts a flowchart of an example method 300 for constructing (e.g., reconstructing) a geometric form of a 3D model of an object. In some examples, the method includes steps 302-310. In some examples, the method is performed by a modeling system 100. [0178] At step 302, optionally, a first 3D model of the object is obtained. In some examples, the first 3D model of the object is obtained from a vector database. The first 3D model of the object can be obtained from the database based on a query (e.g., user-provided query). For example, the first 3D model can be retrieved based on at least one of a received description of the object, one or images of the object (or similar object), or a combination thereof. [0179] At step 304, first encodings (e.g., first embeddings) of first images of the object are obtained. In some examples, the first images include images of a physical object (e.g., images captured by a camera). In some examples, the first images include images derived from a 3D model of the object (e.g., a 3D model of the object constructed during a previous iteration of steps 304-310). For example, the first images can include synthetically created images of the object, which can include rendered images of a 3D model of the object. In some examples, the first images depict views of the object from a set of viewpoints. The first images of the object can be obtained from a database (e.g., vector database 128), provided by a view capturer 152, or obtained from any other suitable source. [0180] In some examples, an encoder of the image-generating model 120 extracts or otherwise derives the first encodings from the first images. In some examples, one or more of the first images and the encodings corresponding thereto are obtained from a vector database 128 based on a query. The query can include a description of the object, one or images of the object (or similar object), or a combination thereof. [0181] At step 306, the image-generating model 120 generates second encodings of second images of the object. The second encodings may be based on the first encodings and on ACTIVE 708712445v1 31
Docket No. 212860-700001/PCT conditioning data. The conditioning data can include a description (e.g., text description) of the object and/or a description of one or more geometric attributes of the object. In some examples, the second encodings are second embeddings, and the second embeddings are generated by conditioning an image generation process performed in a latent space of the image-generating model 120 on the first embeddings and/or on the conditioning data. For example, generating the second embeddings can include providing the first embeddings as input to the image-generating model 120. The image-generating model 120 can include any combination of models for generating images from one or more embeddings. For example, the image-generating model 120 can include a latent, text-to-image diffusion model. [0182] At step 308, the 3D model generator 150 constructs, based on the second encodings, a geometric form of a 3D model (e.g., a new or updated 3D model) of the object, thereby defining one or more shapes, dimensions, and/or orientations of one or more portions (e.g., surfaces) of the 3D model. [0183] At step 310, the system 100 determines whether the geometric form of the 3D model of the object satisfies one or more criteria. If so, the method 300 can end. Otherwise, the method can return to step 304 and repeat the following steps until a satisfactory geometric form is obtained. The system 100 can determine whether the geometric form of a 3D model of an object is satisfactory using any suitable technique. For example, the determination can be based on a user input and/or on the system’s analysis of one or more attributes of the 3D model constructed at step 308. [0184] FIG.4 depicts a flowchart of an example method 400 for applying textures and/or materials to surfaces of a three-dimensional (3D) model. In some examples, the method is performed by a modeling system 160. [0185] At step 402, optionally, a first 3D model of the object is obtained. In some examples, the first 3D model of the object is obtained from a vector database. The first 3D model of the object can be obtained from the database based on a query (e.g., user-provided query). For example, the first 3D model can be retrieved based on at least one of a received description of the object, one or images of the object (or similar object), or a combination thereof. [0186] At step 404, a first encoding (e.g., first embedding) of a first image of the object is obtained. In some examples, the first image is an image of a physical object (e.g., an image captured by a camera). In some examples, the first image is derived from a 3D model of the ACTIVE 708712445v1 32
Docket No. 212860-700001/PCT object (e.g., a 3D model of the object constructed during a previous iteration of steps 404-410. In some examples, the first image is a rendered view or at least a portion of a view of a surface of the first 3D model of the object. In some examples, the first image depicts a view of the object from a viewpoint. The first images of the object can be obtained from a database (e.g., vector database 128), provided by a view capturer 152, or obtained from any other suitable source. [0187] In some examples, an encoder of the image-generating model 120 extracts or otherwise derives the first encoding from the first image. In some examples, the first image and the first encoding corresponding thereto are obtained from a vector database 128 based on a query. The query can include a description of a texture and/or material. [0188] At step 406, the image-generating model 120 generates a second encoding of a second image. In some examples, the second encoding is based on the first encoding and conditioning data. The conditioning data can include any information that related to textures and/or materials. For example, the conditioning data can include a description of one or more visual attributes and/or optical attributes of a surface, such as a description of a texture, material, shading, lighting, reflectivity, and/or color of the surface. [0189] In some examples, the second encoding is a second embedding, and the second embedding is generated by conditioning an image generation process performed in a latent space of the image-generating model 120 on the first embedding and/or on the conditioning data. For example, generating the second embedding can include providing the first embedding as input to the image-generating model 120. The image-generating model 120 can include any combination of models for generating images from one or more embeddings. For example, the image- generating model 120 can include a latent, text-to-image diffusion model. [0190] At step 408, the texture application tool 190 maps the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. After one or more textures and/or materials have been applied to one or more surfaces of the 3D model, a second 3D model of the object is effectively created. [0191] At step 410, the system 160 determines whether the texture and/or material of the surface of the 3D model satisfy one or more criteria. If so, the method 400 can end. Otherwise, the method can return to step 404 and repeat the following steps until a satisfactory texture mapping is obtained. The system 160 can determine whether the texture and/or material of the surface of the 3D model are satisfactory using any suitable technique. For example, the determination can ACTIVE 708712445v1 33
Docket No. 212860-700001/PCT be based on a user input and/or on the system’s analysis of one or more attributes of the 3D model constructed at step 308. [0192] FIG.5 depicts a flowchart of an example method 500 for constructing (e.g., reconstructing) a 3D model of an object. In some examples, the method includes steps 502-508. In some examples, the method is performed by a modeling system. [0193] At step 502, first encodings (e.g., first embeddings) of first images of the object are obtained. In some examples, the first images include images of a physical object (e.g., images captured by a camera). In some examples, the first images include images derived from a 3D model of the object (e.g., a 3D model of the object constructed during a previous iteration of steps 502-508). For example, the first images can include synthetically created images of the object, which can include rendered images of a 3D model of the object. In some examples, the first images depict views of the object from a set of viewpoints. In some examples, the first images of the object are provided by a view capturer 152. In some examples, the view capturer 152 provides the first images of the object using the view generation method 200. In some examples, the first images of the object can be obtained from a database (e.g., vector database 128) or from any other suitable source. [0194] In some examples, an encoder of the image-generating model 120 extracts or otherwise derives the first encodings from the first images. In some examples, one or more of the first images and the encodings corresponding thereto are obtained from a vector database 128 based on a query. The query can include a description of the object, one or images of the object (or similar object), or a combination thereof. [0195] At step 504, the image-generating model 120 generates second encodings of second images of the object. The second encodings may be based on the first encodings and on conditioning data. The conditioning data can include a description (e.g., text description) of the object. In some examples, the description of the object includes a description of one or more geometric attributes of the object, one or more visual attributes of the object, and/or one or more optical attributes of the object. In some examples, the conditioning data include a description of one or more alterations to (1) an aesthetic of the object as depicted in the first images or represented in an existing 3D model of the object, (2) a geometric attribute of the object as depicted in the first images or represented in the existing 3D model of the object, (3) a visual attribute of the object as depicted in the first images or represented in the existing 3D model of ACTIVE 708712445v1 34
Docket No. 212860-700001/PCT the object, (4) an optical attribute of the object as depicted in the first images or represented in the existing 3D model of the object, etc. [0196] In some examples, the second encodings are second embeddings, and the second embeddings are generated by conditioning an image generation process performed in a latent space of the image-generating model 120 on the first embeddings and/or on the conditioning data. For example, generating the second embeddings can include providing the first embeddings as input to the image-generating model 120. The image-generating model 120 can include any combination of models for generating images from one or more embeddings. For example, the image-generating model 120 can include a latent, text-to-image diffusion model. [0197] At step 506, the 3D model generator 150 constructs, based on the second encodings, a 3D model (e.g., a new or updated 3D model) of the object, which indicates (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object. [0198] At step 508, the system determines whether the 3D model of the object satisfies one or more criteria. If so, the method 500 can end. Otherwise, the method can return to step 502 and repeat the following steps until the 3D model of the object satisfies one or more criteria. The system 100 can determine whether the 3D model satisfies one or more criteria using any suitable technique. For example, the determination can be based on a user input and/or on the system’s analysis of one or more attributes of the 3D model. In some examples, the criteria include (1) receipt of user input indicating that the 3D model of the object is satisfactory, (2) receipt of user input requesting termination of modeling, (3) expiry of a maximum time period allocated for the modeling, and/or (4) use of a maximum amount of computational resources allocated for the modeling. [0199] In some examples, a method for generating 3D objects using AI involves (1) obtaining a database of embeddings representing artistic content associated with a user and/or a project, (2) selecting and configuring CAD tools suitable for generating 3D objects consistent with that artistic content (a process sometimes referred to herein as “tool-forming”), and (3) generating 3D objects consisting with that artistic content via an iterative process of synthetic photogrammetry and latent space conditioning. Some examples of this method are described in further detail below and in the section titled “Some Examples.” [0200] In some examples, agents (e.g., expert systems) can select CAD tools to be used in the object-generating process and configure those tools (e.g., set values of the tools’ parameters) to ACTIVE 708712445v1 35
Docket No. 212860-700001/PCT create 3D objects having aesthetic qualities that match or align with the aesthetic qualities of the user- and/or project-specific artistic content. Such agents also may control the operation of the CAD tools during the process of generating the 3D objects. See, e.g., the description of “tool- forming” in the section titled “Some Examples.” [0201] In some examples, an AI-based system may be designed to simulate, mimic, and/or imitate the behaviors and/or decision-making of an expert human graphical artist. In one example, the AI-based system may determine, identify, and/or extrapolate the goals and/or vision for a 3D graphical object by interpreting a request, prompt, and/or input that initiates the object- creation process in view of the contents of the workspace from which the request, prompt, and/or input originates. By doing so, the AI-based system may be able to effectuate and/or realize the goals and/or vision for the 3D graphical object with improved accuracy and/or precision. [0202] In some examples, the AI-based system may perform and/or execute a tool-forming operation by selecting and/or identifying the appropriate authoring tool(s), settings, and/or artistic technique(s) based at least in part on the structured framework (e.g., goals, constraints, and/or corresponding actionable parameters). In one example, the AI-based system may select suitable authoring tool(s) from a range of options. In certain implementations, such authoring tools may include and/or represent software modules that form, shape, and/or manipulate certain features of the 3D graphical object. Examples of such authoring tools include, without limitation, modeling modules, texturing modules, rigging modules, animation modules, combinations or variations of one or more of the same, and/or any other suitable authoring tools. [0203] In some examples, the AI-based system may select suitable authoring tool(s) to ensure that the final object adheres to the creator’s goals and constraints. In one example, the AI-based system may create and/or generate synthetic data (e.g., textures, materials, modeling components, etc.) that facilitates enhancing the realism and/or accuracy of the resulting 3D graphical object. Additionally or alternatively, the AI-based system may condition and/or train AI models involved in the object-creation process based at least in part on the synthetic data. By doing so, the AI-based system may improve the AI-models’ ability to produce high-quality, realistic 3D graphical objects. [0204] In certain implementations, the AI-based system may integrate the synthetic data into the tool-forming operation to enhance the realism and/or accuracy of the resulting 3D graphical object. In one example, the tool-forming operation may provide, serve as, and/or function as the ACTIVE 708712445v1 36
Docket No. 212860-700001/PCT foundation and/or basis for the actual creation of the 3D graphical object. By integrating the synthetic data into the tool-forming process, the AI-based system may be able to model, texture, and/or animate the 3D graphical object to achieve a realistic, high-quality result and/or outcome. For example, the synthetic data may enable the AI-based system to implement improved texture mapping, material application, and/or modeling during the object-creation process. [0205] In some examples, when helpful to satisfy the creator’s goals and/or constraints, the rigging of the 3D graphical object may involve creating and/or setting a skeleton structure within the underlying object to control the movement and/or animation. In one example, the animation of the 3D graphical object may involve causing the skeleton structure of the object to move, act, and/or behave in certain ways to satisfy the creator’s goals and/or constraints. [0206] In some examples, the AI-based system may perform and/or execute an adversarial fine- tuning operation that reduces the need for human oversight and/or intervention. In one example, such adversarial fine-tuning may involve the AI-based system applying adversarial training techniques to improve the AI models’ abilities to create high-quality 3D graphical objects by updating and/or adjusting the models’ parameters and/or internal representations. For example, the AI-based system may present the AI-models with challenging scenarios that force the models to learn more robust and/or accurate representations. [0207] In some examples, the AI-based system may complete the object-creation process by preparing the 3D graphical object for integration into the creator’s project (e.g., a video game, simulation, etc.). In one example, the AI-based system may implement certain quality-assurance checks on the resulting 3D graphical object. In this example, the AI-based system may also optimize the 3D graphical object to perform satisfactorily in the creator’s target environment. Such optimization may involve reducing the polygon count, compressing textures, and/or adjusting the geometric form of the 3D graphical object. The AI-based system may then package the 3D graphical object for export to a format compatible with the creator’s project and/or target environment. [0208] In some examples, the AI-based system may implement any of a variety of different types of AI models and/or techniques to achieve the objectives described herein. Examples such AI models include, without limitation, machine learning models, convolutional neural networks, recurrent neural networks, supervised learning models, unsupervised learning models, linear regression models, logistic regression models, support vector machine models, Naive Bayes ACTIVE 708712445v1 37
Docket No. 212860-700001/PCT models, k-nearest neighbor models, k-means models, random forest models, combinations or variations of one or more of the same, and/or any other suitable power models. Computer-Based Implementations [0209] Techniques operating according to the principles described herein may be implemented in any suitable manner. Included in this disclosure is a description of the steps and acts of various processes that generate 3D objects. The processing and decision blocks of the flow charts above represent steps and acts that may be included in algorithms that carry out these various processes. Algorithms derived from these processes may be implemented as software integrated with and directing the operation of one or more single- or multi-purpose processors, may be implemented as functionally-equivalent circuits such as a Digital Signal Processing (DSP) circuit, Field Programmable Gate Array (FPGA), or an Application-Specific Integrated Circuit (ASIC), or may be implemented in any other suitable manner. It should be appreciated that the flow charts included herein do not depict the syntax or operation of any particular circuit or of any particular programming language or type of programming language. Rather, the flow charts illustrate the functional information one of ordinary skill in the art may use to fabricate circuits or to implement computer software algorithms to perform the processing of a particular apparatus carrying out the types of techniques described herein. It should also be appreciated that, unless otherwise indicated herein, the particular sequence of steps and/or acts described in each flow chart is merely illustrative of the algorithms that may be implemented and can be varied in implementations and embodiments of the principles described herein. [0210] Accordingly, in some embodiments, the techniques described herein may be embodied in computer-executable instructions implemented as software, including as application software, system software, firmware, middleware, embedded code, or any other suitable type of software. Such computer-executable instructions may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine. [0211] When techniques described herein are embodied as computer-executable instructions, these computer-executable instructions may be implemented in any suitable manner, including as a number of functional facilities, each providing one or more operations to complete execution of ACTIVE 708712445v1 38
Docket No. 212860-700001/PCT algorithms operating according to these techniques. A “functional facility,” however instantiated, is a structural component of a computer system that, when integrated with and executed by one or more computers, causes the one or more computers to perform a specific operational role. A functional facility may be a portion of or an entire software element. For example, a functional facility may be implemented as a function of a process, or as a discrete process, or as any other suitable unit of processing. If techniques described herein are implemented as multiple functional facilities, each functional facility may be implemented in its own way; all need not be implemented the same way. Additionally, these functional facilities may be executed in parallel and/or serially, as appropriate, and may pass information between one another using a shared memory on the computer(s) on which they are executing, using a message passing protocol, or in any other suitable way. [0212] Generally, functional facilities include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the functional facilities may be combined or distributed as desired in the systems in which they operate. In some implementations, one or more functional facilities carrying out techniques herein may together form a complete software package. These functional facilities may, in alternative embodiments, be adapted to interact with other, unrelated functional facilities and/or processes, to implement a software program application. [0213] Some exemplary functional facilities have been described herein for carrying out one or more tasks. It should be appreciated, though, that the functional facilities and division of tasks described is merely illustrative of the type of functional facilities that may implement the exemplary techniques described herein, and that embodiments are not limited to being implemented in any specific number, division, or type of functional facilities. In some implementations, all functionality may be implemented in a single functional facility. It should also be appreciated that, in some implementations, some of the functional facilities described herein may be implemented together with or separately from others (i.e., as a single unit or separate units), or some of these functional facilities may not be implemented. [0214] Computer-executable instructions implementing the techniques described herein (when implemented as one or more functional facilities or in any other manner) may, in some embodiments, be encoded on one or more computer-readable media to provide functionality to the media. Computer-readable media include magnetic media such as a hard disk drive, optical ACTIVE 708712445v1 39
Docket No. 212860-700001/PCT media such as a Compact Disk (CD) or a Digital Versatile Disk (DVD), a persistent or non- persistent solid-state memory (e.g., Flash memory, Magnetic RAM, etc.), or any other suitable storage media. Such a computer-readable medium may be implemented in any suitable manner, including as computer-readable storage media 606 of FIG. 6 described below (i.e., as a portion of a computing device 600) or as a stand-alone, separate storage medium. As used herein, “computer-readable media” (also called “computer-readable storage media”) refers to tangible storage media. Tangible storage media are non-transitory and have at least one physical, structural component. In a “computer-readable medium,” as used herein, at least one physical, structural component has at least one physical property that may be altered in some way during a process of creating the medium with embedded information, a process of recording information thereon, or any other process of encoding the medium with information. For example, a magnetization state of a portion of a physical structure of a computer-readable medium may be altered during a recording process. [0215] Further, some techniques described above comprise acts of storing information (e.g., data and/or instructions) in certain ways for use by these techniques. In some implementations of these techniques—such as implementations where the techniques are implemented as computer- executable instructions—the information may be encoded on a computer-readable storage media. Where specific structures are described herein as advantageous formats in which to store this information, these structures may be used to impart a physical organization of the information when encoded on the storage medium. These advantageous structures may then provide functionality to the storage medium by affecting operations of one or more processors interacting with the information; for example, by increasing the efficiency of computer operations performed by the processor(s). [0216] In some, but not all, implementations in which the techniques may be embodied as computer-executable instructions, these instructions may be executed on one or more suitable computing device(s) operating in any suitable computer system, including the exemplary computer systems described herein, or one or more computing devices (or one or more processors of one or more computing devices) may be programmed to execute the computer- executable instructions. A computing device or processor may be programmed to execute instructions when the instructions are stored in a manner accessible to the computing device/processor, such as in a local memory (e.g., an on-chip cache or instruction register, a ACTIVE 708712445v1 40
Docket No. 212860-700001/PCT computer-readable storage medium accessible via a bus, a computer-readable storage medium accessible via one or more networks and accessible by the device/processor, etc.). Functional facilities that comprise these computer-executable instructions may be integrated with and direct the operation of a single multi-purpose programmable digital computer apparatus, a coordinated system of two or more multi-purpose computer apparatuses sharing processing power and jointly carrying out the techniques described herein, a single computer apparatus or coordinated system of computer apparatuses (co-located or geographically distributed) dedicated to executing the techniques described herein, one or more Field-Programmable Gate Arrays (FPGAs) for carrying out the techniques described herein, or any other suitable system. [0217] FIG.6 illustrates one exemplary implementation of a computing device in the form of a computing device 600 that may be used in a system implementing the techniques described herein, although others are possible. It should be appreciated that FIG.6 is intended neither to be a depiction of necessary components for a computing device to operate in accordance with the principles described herein, nor a comprehensive depiction. [0218] Computing device 600 may comprise one or more processors 602, a network adapter 604, and computer-readable storage media 606. Computing device 600 may be, for example, a desktop or laptop personal computer, a personal digital assistant (PDA), a smart mobile phone, a server, a wireless access point or other networking element, or any other suitable computing device. Network adapter 604 may be any suitable hardware and/or software to enable the computing device 600 to communicate wired and/or wirelessly with any other suitable computing device over any suitable computing network. The computing network may include wireless access points, switches, routers, gateways, and/or other networking equipment as well as any suitable wired and/or wireless communication medium or media for exchanging data between two or more computers, including the Internet. Computer-readable storage media 606 may be adapted to store data to be processed and/or instructions to be executed by one or more processors 602. The one or more processors 602 enable processing of data and execution of instructions. The data and instructions may be stored on the computer-readable storage media 606. [0219] The data and instructions stored on computer-readable storage media 606 may comprise computer-executable instructions implementing techniques which operate according to the principles described herein. In the example of FIG.6, computer-readable storage media 606 ACTIVE 708712445v1 41
Docket No. 212860-700001/PCT stores computer-executable instructions implementing various facilities and storing various information as described above. Computer-readable storage media 606 may store an object generation facility 608 that implements one or more of the tools and/or methods described herein, and modeling data 610 comprising the objects, models, embeddings, and other data described herein. [0220] While not illustrated in FIG. 6, a computing device may additionally have one or more components and peripherals, including input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computing device may receive input information through speech recognition or in other audible format. Some Examples [0221] This section describes some examples of implementations of systems and methods for generative creation of 3D models (e.g., production-ready 3D objects). Embodiments are not limited to operating in accordance with these illustrative examples, as other embodiments are possible. [0222] Disadvantages of Existing Technologies [0223] In recent years, the field of 3D model (e.g., 3D object) generation has seen significant advancements, largely driven by the integration of AI tools. Traditional approaches to 3D object generation have primarily focused on reconstruction techniques, where AI tools attempt to recreate or mimic existing 3D objects based on input data such as images or scans. While these methods have shown promise, they often bypass the nuanced workflows and expert decisions that are inherent in professional 3D object creation. [0224] Professional 3D object creation involves a complex pipeline that includes tasks such as modeling, texturing, rigging, and animation. Many steps in this pipeline rely on specialized knowledge and skills, which are typically acquired through years of experience in the field. By skipping these professional workflows, existing AI solutions may produce 3D objects that lack the quality, detail, and authenticity that are characteristic of professionally created model. ACTIVE 708712445v1 42
Docket No. 212860-700001/PCT [0225] This section describes some examples of implementations of AI tools that address this gap by adopting a constructive perspective to 3D object generation. Instead of merely reconstructing existing models, some examples of the AI tools mimic the professional object creation pipeline through an expert system. This expert system may replicate the decision- making processes of skilled artists and technicians at each stage of the 3D object creation pipeline. By doing so, some examples of the AI tools ensure that the generated 3D objects are not only realistic but also adhere to the standards and practices of professional object creation. [0226] The constructive approach employed by some examples of the AI tools represents a significant departure from conventional reconstruction-based methods. By integrating the expert knowledge and workflows of professional 3D object creation, some examples of the AI tools offer a more authentic and high-quality solution for 3D object generation. [0227] Summary [0228] The present section introduces some examples of an innovative approach to 3D object generation that integrates deep 3D object understanding and workspace conditioning to produce high-quality 3D objects. This approach may include an ability to interpret creator requests and generate corresponding goals and constraints, which guide the object creation process. [0229] Some examples of the tools described herein include an expert system that acts as a virtual technical artist. In some examples, this system possesses an extensive knowledge of professional 3D object creation workflows and techniques. Utilizing this expertise, the expert system can make informed decisions and execute actions that replicate those of a skilled human artist. [0230] In some examples, the “tool-forming” process involves dynamically configuring and manipulating the tools and resources within a professional 3D object creation workspace to achieve the desired outcomes. In some examples, the expert system uses the goals and constraints derived from the creator’s request to steer the tool-forming process, ensuring that the final 3D object meets the specified requirements. [0231] In some examples, workspace conditioning involves analyzing the creator’s workspace, which may contain concept art, reference images, design documents, and other relevant materials. By incorporating this information, the system can further refine the goals and constraints, leading to 3D object that is not only of high quality but also closely aligned with the creator’s creative vision. ACTIVE 708712445v1 43
Docket No. 212860-700001/PCT [0232] In some examples, the disclosed tools provide a comprehensive solution for 3D object generation that combines deep model understanding, creator-centric customization, and expert- guided creation. In some examples, this solution results in a system capable of producing professional-grade 3D objects efficiently and effectively, catering to the needs of various industries, including gaming, film, and virtual or augmented reality. [0233] Terms [0234] The illustrative descriptions of the following terms are provided by way of example and are not limiting. [0235] Creator: In some examples, a creator is an individual or entity that engages with AI tools through natural language requests and/or creative inputs for technical art assistance. [0236] Workspace: In some examples, a workspace includes a collection of code, models, and other relevant materials used by the creator, optionally organized in a repository format common in game development and production projects. [0237] Assistant: In some examples, an assistant includes an AI agent that interprets the creator’s intentions and translates them into computational operations for technical art creation. [0238] Expert System: In some examples, an expert system includes an inference engine that structures inputs, knowledge, processes, and outputs, providing a framework for AI tools to make informed decisions in the technical art domain. [0239] 3D Object: In some examples, a 3D object includes a digital object, model, and/or asset used in 3D environments, such as games or simulations, which can include textures, materials, and animations. [0240] Authoring Tool: In some examples, an authoring tool includes a software application or tool used for creating 3D objects, which is learned and operated by AI agents through the process of “tool-forming,” as guided by the expert system. [0241] Knowledge Base: In some examples, a knowledge base includes a vector database (e.g., neural semantic database) that stores structured knowledge and information used by an expert system or AI model to make decisions and guide the creation of 3D objects. [0242] 3D Object Embeddings: In some examples, 3D object embeddings include feature representations extracted from 3D objects using multi-modal models (e.g., encoders), which capture various aspects of the models for use in AI processes. ACTIVE 708712445v1 44
Docket No. 212860-700001/PCT [0243] Retrieval Augmented Generation (RAG): In some examples, RAG is a technique whereby information retrieved from a source external to a generative model (e.g., from a vector database) is incorporated into the generative model’s context. RAG can enhance the accuracy and relevance of the content generated by the generative model. [0244] RAC (Retrieval Augmented Construction): In some examples, RAC is a technique analogous to RAG, whereby conditioning data retrieved from a source (e.g., vector database) external to a generative model (e.g., image-generating model) is used to condition the process by which the generative model generates content (e.g., images) used to construct 3D models. RAC can infuse the constructed model with attributes or an aesthetic identified in user-provided input. [0245] Tool-forming 3D Pipeline: In some examples, a tool-forming 3D pipeline includes a pipeline specifically designed to create tools configured (e.g., optimized) for use by AI in 3D content creation. It incorporates expert systems for symbolic reasoning, synthetic data generation, and specialized servers for tasks like texture synthesis and handling complex multi- object scenarios. [0246] Workspace Conditioning: In some examples, workspace conditioning includes an AI training method that tailors an AI model (e.g., conditioning model) to a user’s specific work environment. Workspace conditioning can involve multi-modal indexing of the user’s workspace, including files, documents, and models, to create a detailed contextual understanding. [0247] Multi-Modal Workspace Indexing: In some examples, multi-modal workspace indexing is a process of indexing various types of data in the user’s workspace, such as filesystem repositories, documents, and existing models, to provide a comprehensive view of the workspace for AI conditioning. [0248] Adversarial Fine-Tuning: A technique used to refine AI models by reducing reliance on human oversight. It is part of the workspace conditioning process. [0249] Deep ETL (Extract, Transform, Load): Heuristics that offers deep insights into 3D objects by data extraction, transformation, and loading process. [0250] Synthetic Photogrammetry Models: The creation of synthetic models using photogrammetry techniques, which are then used for conditioning AI models. ACTIVE 708712445v1 45
Docket No. 212860-700001/PCT [0251] Latent Space: A multi-dimensional space where the internal representations of one or more AI tools’ data are stored. In some examples, the latent space is used for conditioning an AI tool’s outputs by directly manipulating its internal representations. [0252] An Example Construction Process [0253] 1. Creative Input: The creator engages with the AI tools by providing natural language requests and creative inputs through the Assistant. These inputs can include descriptions, concept art, reference images, and design documents. [0254] 2. Workspace Analysis: The system analyzes the creator’s Workspace, which contains code, models, and other relevant materials. This step involves Multi-Modal Workspace Indexing to create a detailed contextual understanding of the creator’s environment. [0255] 3. Goal and Constraint Generation: Based on the creative input and workspace analysis, the system generates goals and constraints for the 3D object creation process. This step involves the Expert System, which uses its structured knowledge to interpret the creator’s intentions. [0256] 4. Tool-forming: The system selects and configures the appropriate Authoring Tools for the task at hand. This process, known as “tool-forming,” is guided by the Expert System, which determines the best tools and settings to use based on the goals and constraints. [0257] 5. Synthetic Data Generation: To enhance the realism and accuracy of the 3D object, the system generates synthetic data, such as textures and materials, using techniques like Synthetic Photogrammetry Models. This data is used to condition AI models and improve the quality of the final model. [0258] 6. Object creation: With the tools configured and synthetic data generated, the system begins the object creation process. This involves using the selected Authoring Tools to model, texture, and animate the 3D object according to the specified goals and constraints. [0259] 7. Adversarial Fine-Tuning: Throughout the object creation process, the system employs Adversarial Fine-Tuning to refine the AI models and reduce reliance on human oversight. This step ensures that the generated model meets the high-quality standards required for production. [0260] 8. Deep ETL (Extract, Transform, Load) : To gain deeper insights into the 3D objects, the system uses a Reverse-Engineering ETL Model. This model helps in understanding the structure and properties of the models, which can be used for further refinement. ACTIVE 708712445v1 46
Docket No. 212860-700001/PCT [0261] 9. Latent Space Manipulation: The system fine-tunes the outputs of an AI tool by directly manipulating its internal representations in the Latent Space. This allows for precise adjustments to the 3D object, ensuring that it aligns with the creator’s vision and the technical requirements of the project. [0262] 10. Production-Ready Model: The final step involves generating a production-ready 3D object that meets the creator’s specifications and is suitable for use in games, simulations, or other 3D-graphics environments. This model is then delivered to the creator for integration into their project. [0263] Some Embodiments of Steps of a Construction Process [0264] Creative Input [0265] The Creative Input step is the initial phase where the creator engages with the AI tools to provide the foundational inputs for the 3D object creation process. This engagement can occur through various channels, including natural language requests, creative inputs, and point-and- click 3D applications. The inputs provided by the creator may shape the direction and outcome of the 3D object creation process. [0266] Natural Language Requests: Creators can communicate their requirements and specifications for the 3D object using natural language. This is facilitated by large language models that interpret the creator’s requests and translate them into structured instructions for some examples of an AI system. The use of natural language allows creators to express their ideas and visions in a flexible and intuitive manner. [0267] Creative Inputs: In addition to textual descriptions, creators can provide a range of creative inputs to further define their vision for the 3D object. These inputs can include concept art, reference images, design documents, and even music and sound effects. These materials help to provide a richer context for the object creation process and enable some examples of an AI system to better understand the desired aesthetic and functional attributes of the final model. [0268] Point-and-Click 3D Applications: For a more interactive engagement, creators can use point-and-click 3D applications to provide inputs. These applications allow creators to visually select and manipulate elements in a 3D environment, offering a more direct and intuitive way to convey their requirements for the model. [0269] The Creative Input step may help establish a clear and comprehensive understanding of the creator’s vision and requirements. By leveraging natural language understanding, creative ACTIVE 708712445v1 47
Docket No. 212860-700001/PCT inputs, and interactive 3D applications, the AI tools can accurately interpret the creator’s intentions and set the stage for the subsequent steps in the object creation process. [0270] Workspace Analysis [0271] In the Workspace Analysis step, the AI tools systematically examine the creator’s Workspace, which encompasses the code, models, and other materials relevant to the 3D object creation project. This analysis may help the tools to understand the context and constraints within which the model is being developed. Some examples of workspace content are shown in Figs. 7 and 8. [0272] Workspace conditioning is an innovative AI training method that tailors AI models to a user’s specific work environment. It involves multi-modal indexing of the user's workspace, including files, documents, and objects, to create a detailed contextual understanding. Embeddings generated from this data are stored in a vector database, allowing for efficient neural searching. In some examples, AI models are further refined through adversarial fine- tuning, reducing reliance on human oversight. A reverse-engineering ETL model offers deep insights into game objects. [0273] Multi-Modal Workspace Indexing: The system employs a technique known as Multi- Modal Workspace Indexing to create a comprehensive overview of the creator’s environment. This involves analyzing various types of data present in the workspace, including filesystem repositories, documents, existing models, and potentially codebases. The goal is to capture a detailed understanding of the workspace’s structure, content, and the relationships between different elements. [0274] Multi-modal workspace indexing may include indexing of the file system, repositories, documents, game concepts, existing objects, etc. In some embodiments, indexing of the user- space may include indexing of cloud workspaces, uploaded owned objects, fine-tuning image datasets, bring-your-own-models, etc. In some embodiments, such indexing may provide deep understanding of objects in any suitable format (e.g., OBJ, FBX, GLTF, etc.). [0275] Contextual Understanding: By analyzing the workspace, the system gains insights into the creator’s project, including the style, themes, and technical specifications that might influence the design and development of the 3D object. This contextual understanding may be helpful in tailoring the object creation process to fit seamlessly into the larger project. ACTIVE 708712445v1 48
Docket No. 212860-700001/PCT [0276] Integration with Creative Input: The information gleaned from the workspace analysis is integrated with the creative inputs provided by the creator in the previous step. This integration may ensure that the goals and constraints for the object creation are aligned with both the creator’s vision and the practical aspects of the project environment. [0277] Preparation for Goal and Constraint Generation: The insights obtained from the workspace analysis lay the groundwork for the next step in the process, which is the generation of specific goals and constraints for the 3D object. By thoroughly understanding the workspace, the system can produce more accurate and relevant goals and constraints that reflect the creator’s needs and the project context. [0278] The Workspace Analysis step may provide the system with a deep understanding of the creator’s project environment. This understanding enables some examples of AI tools to make informed decisions and produce 3D objects that are not only high-quality but also highly compatible with the creator’s overall project. [0279] Goal and Constraint Generation [0280] In this step, the AI tools leverage the insights gained from the Creative Input and Workspace Analysis to establish specific goals and constraints for the 3D object creation process. In this step, the system translates the creator’s vision and project context into actionable parameters that guide the development of the model. [0281] Expert System Involvement: The Expert System uses its structured knowledge base to interpret the creator’s intentions and the contextual information from the workspace analysis. The system then generates a set of goals that outline what the final model should achieve and constraints that define the boundaries within which the model is developed. [0282] Structured Knowledge Utilization: The Expert System utilizes its structured knowledge to ensure that the goals and constraints are comprehensive and relevant. This knowledge includes industry standards, best practices, and technical specifications related to 3D object creation. By leveraging this knowledge, the system can generate goals and constraints that are realistic, achievable, and aligned with professional standards. [0283] Alignment with Creative Input: The goals and constraints are closely aligned with the creative inputs provided by the creator. For example, if the creator has specified a particular style or theme, the goals reflect these aesthetic requirements, and the constraints ensure that the model adheres to these stylistic guidelines. ACTIVE 708712445v1 49
Docket No. 212860-700001/PCT [0284] Technical and Aesthetic Considerations: The generation of goals and constraints can account for technical and/or aesthetic considerations. Technical constraints may include file size limits, polygon count, or compatibility with specific software, while aesthetic goals can involve achieving a certain level of realism, adhering to a color scheme, or matching a particular art style. [0285] Foundation for Tool-forming: The established goals and constraints serve as the foundation for the next step in the process, which is Tool-forming. They provide helpful guidance for selecting and configuring the appropriate authoring tools and techniques to create the 3D object. [0286] The Goal and Constraint Generation step may help translate the creator’s vision and project requirements into a structured framework that guides the entire object creation process. By clearly defining what the model should achieve and the parameters within which it should be developed, this step ensures that the final product is closely aligned with the creator’s expectations and project needs. [0287] Tool-forming [0288] In some examples of the Tool-forming step, the AI tools select and configure the appropriate authoring tools based on the goals and constraints established in the previous phase. This process may help ensure that the technical aspects of 3D object creation are aligned with the creator’s vision and project requirements. Fig.25A shows an example illustration of a portion of a tool-forming process. [0289] Expert System Guidance: The Expert System may determine the best tools and settings to use for the object creation process. It utilizes its structured knowledge base to match the specific goals and constraints with the capabilities of various authoring tools. Expert systems for symbolic reasoning may include decision trees and/or behavior trees. Such expert systems may help to solve the alignment problem. [0290] Selection of Authoring Tools: Based on the requirements of the 3D object, the system selects suitable authoring tools from a range of options. These tools can include software for modeling, texturing, rigging, animation, and other aspects of 3D object creation. The selection is made to ensure that the chosen tools are capable of achieving the desired outcomes while adhering to the defined constraints. ACTIVE 708712445v1 50
Docket No. 212860-700001/PCT [0291] Configuration and Customization: Once the appropriate tools have been selected, the system configures and customizes them to fit the specific needs of the project. This can involve setting parameters, choosing presets, or even creating custom scripts and plugins to extend the functionality of the tools. The aim is to configure (e.g., optimize) the tools for efficiency and effectiveness in the object creation process. [0292] Integration with Synthetic Data: In some cases, the tool-forming process may also involve integrating synthetic data generated in the previous step. This data, such as textures and materials, can be used to enhance the realism and accuracy of the 3D object. [0293] Tools for synthetic data generation may include Blender3D, UnrealEngine, etc. A Blender3D tool server may provide customized diffusion and procedural models, tools for texture synthesis (e.g., depths, normals, semseg, etc.), decision trees for multi-objects and/or multi-tasks (e.g., uv-mapping, texture painting, etc.). In some embodiments, inputs and outputs to such models and tools may be provided in JSON format. The Blender3D tool server may provide support for JSON as an intermediate language. [0294] In some embodiments, synthesis tools and techniques may include a photogrammetry synthesis viewer, synthetic photogrammetry objects generation tools, synthesized RGB-D + normal maps, a ControlNet-like workflow, NeRF conditioning tools, etc. [0295] Preparation for Object creation: The tool-forming step prepares the groundwork for the actual creation of the 3D object. With the tools selected and configured, the system is now equipped to start modeling, texturing, and animating the model according to the specified goals and constraints. [0296] The Tool-forming step may ensure that the technical tools and resources are properly aligned with the creative and project requirements. By carefully selecting and configuring the authoring tools, the system sets the stage for the efficient and effective creation of the 3D object. [0297] Fig. 14 shows a block diagram of an example tool-forming 3D pipeline. The Tool- forming 3D pipeline is specifically designed to create tools that are configured (e.g., optimized) for use by AI in 3D content creation. It incorporates expert systems for symbolic reasoning, using decision and behavior trees to ensure AI decisions are informed and aligned with objectives. Synthetic data generation, facilitated by platforms like Blender3D and UnrealEngine, are used for conditioning AI models and achieving realistic 3D objects. The pipeline includes a specialized Blender3D tool server, equipped with a JSON intermediate language and customized ACTIVE 708712445v1 51
Docket No. 212860-700001/PCT models for tasks like texture synthesis and handling complex multi-object scenarios. Similarly, the Photogrammetry tool server enhances surface reconstruction and enables novel view synthesis. A photogrammetry tool server may provide tools for customized surface reconstruction, procedural models, tools for novel view and shape synthesis, etc. [0298] Synthetic Data Generation [0299] In the Synthetic Data Generation step, the AI tools create synthetic data, such as textures and materials, to enhance the realism and accuracy of the 3D object. This process may help condition AI models and improve the quality of the final model. [0300] Use of Synthetic Photogrammetry Models: The generation of Synthetic Photogrammetry Models involves creating realistic textures, materials, and other model components using photogrammetry techniques. These synthetic models are used to provide detailed and accurate representations of real-world objects and surfaces. [0301] Conditioning AI Models: The synthetic data generated in this step is used to condition AI models involved in the object creation process. By training the AI models with this data, the system can improve their ability to produce high-quality, realistic 3D objects. The conditioning process helps the AI models learn from the synthetic data and apply this knowledge to the creation of the actual model. [0302] Integration with Tool-forming: The synthetic data is integrated with the authoring tools selected and configured in the Tool-forming step. This integration allows the tools to utilize the synthetic data for tasks such as texture mapping, material application, and other aspects of object creation that require realistic data. [0303] Enhancement of Model Quality: The use of synthetic data in the object creation process results in enhanced quality of the final 3D object. By providing detailed and accurate textures, materials, and other components, the synthetic data helps achieve a level of realism that is difficult to attain with traditional methods. [0304] Preparation for Object creation: The generation of synthetic data prepares the system for the actual creation of the 3D object in the next step. With the synthetic data ready, the system can proceed to model, texture, and animate the model with a focus on achieving the highest possible quality. ACTIVE 708712445v1 52
Docket No. 212860-700001/PCT [0305] The Synthetic Data Generation step may ensure that the AI models and authoring tools have access to high-quality, realistic data. This step may contribute to the overall quality and realism of the final 3D object. [0306] Object creation [0307] The Object creation step is where the actual development of the 3D object takes place. Utilizing the authoring tools selected and configured in the Tool-forming step, and incorporating the synthetic data generated in the Synthetic Data Generation step, the system begins to model, texture, and animate the 3D object according to the specified goals and constraints. [0308] Modeling: The first task in object creation is modeling, where the basic shape and structure of the 3D object are constructed. Using the chosen authoring tools, the system creates the geometric form of the model, ensuring that it aligns with the aesthetic and functional requirements outlined in the goals. [0309] Texturing: Once the model is complete, the next step is texturing. In this phase, the synthetic textures and materials generated earlier are applied to the model to give it a realistic appearance. The system carefully maps these textures onto the model, paying attention to details such as color, reflectivity, and surface texture. [0310] Rigging and Animation: If the 3D object requires animation, the system proceeds to rigging, where a skeleton structure is created to control the movement of the object. Following rigging, the 3D object is animated according to the specified goals, which might include specific actions, behaviors, or movements. [0311] Quality Assurance: Throughout the object creation process, the system continuously checks the quality of the object to ensure that it meets the established standards. This includes verifying that the object adheres to the technical constraints, such as polygon count and file size, as well as ensuring that the aesthetic goals, such as style and realism, are achieved. [0312] Iterative Refinement: Object creation is often an iterative process, where the model undergoes multiple rounds of refinement based on feedback and evaluation. The system adjusts and improves to the model as appropriate to ensure that the final product meets the creator’s expectations and project requirements. [0313] The Object creation step is part of the AI object creation process, where the 3D object is brought to life through a combination of modeling, texturing, rigging, and animation. This step ACTIVE 708712445v1 53
Docket No. 212860-700001/PCT may help transform the creator’s vision and the system’s planning into a tangible, high-quality 3D object. [0314] Adversarial Fine-Tuning [0315] Adversarial Fine-Tuning may involve using the AI models in the creation of the 3D object. In the Adversarial Fine-Tuning step, the AI models are refined to reduce reliance on human oversight and improve the quality of the final product. In some embodiments, the use of adversarial fine-tuning may reduce or eliminate the need for human-in-the-loop activity. [0316] Refinement of AI Models: In this step, the AI models involved in the object creation process undergo fine-tuning to enhance their performance. This involves adjusting the models’ parameters and training them with additional data to improve their ability to generate high- quality models. [0317] Adversarial Training: This fine-tuning process may involve the use of adversarial training techniques. These techniques involve presenting the AI models with challenging scenarios or adversarial examples that force the models to learn more robust and accurate representations. This helps the models become more resilient and capable of handling a wider range of object creation tasks. [0318] Reduction of Human Oversight: By fine-tuning the AI models, the system aims to reduce the need for human oversight in the object creation process. The goal is to create a more autonomous system that can produce high-quality models with minimal intervention from human creators or technicians. [0319] Quality Assurance: Throughout the adversarial fine-tuning process, the quality of the AI-generated models is continuously monitored to ensure that they meet the established goals and constraints. The system evaluates the models for technical accuracy, aesthetic appeal, and adherence to project specifications. [0320] Iterative Improvement: Adversarial fine-tuning is an iterative process, where the AI models are repeatedly tested and refined based on feedback and performance evaluations. This iterative approach allows for continuous improvement of the models, leading to better quality and more reliable model generation over time. [0321] The Adversarial Fine-Tuning step may help enhance the capabilities of the AI models used in the AI object creation process. By fine-tuning the models through adversarial training, ACTIVE 708712445v1 54
Docket No. 212860-700001/PCT the system can produce higher-quality 3D objects with reduced reliance on human oversight, ultimately streamlining the object creation workflow. [0322] Deep ETL (Extract, Transform, Load) [0323] The Deep ETL (Extract, Transform, Load) step is where the system delves into the intricacies of the 3D objects being created. This step involves a thorough analysis of the data extraction, transformation, and loading processes to gain a comprehensive understanding of the models’ structure and properties. In some embodiments, the use of ETL modeling may provide deep understanding of game objects. In some embodiments, pre- and post-processing of objects may be provided. [0324] Extraction: The system begins by extracting data from the 3D objects, which includes geometric details, textures, materials, and other pertinent attributes. This data is extracted in a structured manner to facilitate further analysis and processing. [0325] Transformation: The extracted data undergoes transformation to enhance its utility and relevance for the object creation process. This may involve changing data formats, normalizing values, or applying other transformations to prepare the data for in-depth analysis and integration. [0326] Loading: After transformation, the data is loaded into the system’s knowledge base or other storage solutions for subsequent use. The loaded data enriches the system’s structured knowledge, contributing to a more profound understanding of 3D objects and their characteristics. [0327] Insights and Refinement: Leveraging the insights gained from the Deep ETL process, the system refines the object creation process. A more detailed understanding of the models’ structure and properties allows for informed decision-making and adjustments, resulting in enhanced quality and accuracy of the final models. [0328] Integration with Object creation: The insights and data derived from the Deep ETL process are integrated into the ongoing object creation process. This integration enables real-time adjustments and improvements based on the comprehensive understanding of the models being created. [0329] The Deep ETL (Extract, Transform, Load) step may help attain and/or achieve a deeper understanding of the 3D objects and their components. By meticulously analyzing the data ACTIVE 708712445v1 55
Docket No. 212860-700001/PCT extraction, transformation, and loading processes, the AI system can elevate the quality and accuracy of the object creation process, leading to more realistic and detailed final models. [0330] Latent Space Manipulation [0331] Latent Space Manipulation is where the system fine-tunes the outputs of an AI tool by directly manipulating its internal representations. This step may help ensure that the final 3D object aligns precisely with the creator’s vision and the technical requirements of the project. [0332] Accessing the Latent Space: The system accesses the latent space of the AI models, which is a multi-dimensional space where the internal representations of data of one or more AI tools are stored. This space contains the encoded features and characteristics of the 3D objects being created. [0333] Direct Manipulation: The system directly manipulates the representations in the latent space to adjust specific aspects of the 3D object. This can involve tweaking features related to the model’s shape, texture, color, or other attributes to achieve the desired outcomes. [0334] Precision and Control: Latent space manipulation provides a high degree of precision and control over the object creation process. By making targeted adjustments in the latent space, the system can achieve subtle and precise modifications that are not easily attainable through traditional methods. [0335] Alignment with Goals and Constraints: The manipulations in the latent space are guided by the goals and constraints established earlier in the process. This ensures that the adjustments are consistent with the creator’s requirements and the project’s technical specifications. [0336] Iterative Refinement: The process of latent space manipulation is often iterative, with multiple rounds of adjustments and evaluations. The system continuously refines the model based on feedback and performance metrics until it meets the desired standards of quality and accuracy. [0337] Latent Space Manipulation is a powerful step in the AI object creation process, providing the ability to fine-tune an AI tool’s outputs with a high level of precision. By directly manipulating the internal representations in the latent space, the system can ensure that the final 3D object is a faithful and accurate realization of the creator’s vision. [0338] Production-Ready Model ACTIVE 708712445v1 56
Docket No. 212860-700001/PCT [0339] The Production-Ready Model step is the culmination of the AI object creation process, where the final 3D object is prepared for integration into the creator’s project. This step ensures that the model meets all the specified requirements and is suitable for use in games, simulations, or other 3D environments. [0340] Final Quality Assurance: Before declaring the model as production-ready, the system performs a final round of quality assurance checks. This includes verifying that the model adheres to the established goals and constraints, ensuring that it is technically sound, and confirming that it meets the desired aesthetic standards. [0341] Optimization for Performance: The model undergoes optimization to ensure that it performs well in the intended environment. This may involve reducing polygon count, compressing textures, or making other adjustments to improve performance without compromising quality. [0342] Packaging and Exporting: The production-ready model is packaged and exported in a format that is compatible with the creator’s project. This includes organizing the model’s components, such as models, textures, and animations, into a cohesive package that can be easily integrated into the project’s workflow. [0343] Delivery to the Creator: The final model is delivered to the creator, along with any helpful documentation or metadata. The creator can then incorporate the model into their project, where it can be used as intended in the game, simulation, or other 3D environment. [0344] Post-Delivery Support: After the model is delivered, the system may provide post- delivery support to address any issues or adjustments that the creator may require. This ensures that the model continues to meet the project’s needs and performs adequately (e.g., optimally) in its intended environment. [0345] The Production-Ready Model step is the final phase in the AI object creation process, marking the transition of the 3D object from development to deployment. By ensuring that the object is of high quality, configured (e.g., optimized) for performance, and ready for integration, this step completes the journey from creative input to a tangible, production-ready object. [0346] Aspects of Some Examples of Systems and Methods for 3D Model Construction [0347] Retrieval Augmented Construction (RAC) for Few-Shot 3D object Details [0348] RAC can be used to semantically retrieve 3D object details that closely match the user’s request. Then the system can proceed to build a model based on the engineering details of the ACTIVE 708712445v1 57
Docket No. 212860-700001/PCT retrieval. This process can involve texture and material semantic matching, rig and mesh matching, etc. Some examples of block diagrams illustrating retrieval augmented construction processes are shown in Fig.9. [0349] Texture and Material Semantic Matching: RAC identifies textures and materials that are semantically related to the user’s request. For example, if the user specifies a “wooden” material, RAC retrieves textures that resemble wood grain patterns. [0350] Rig and Mesh Matching: RAC finds rigs that are compatible with similar meshes based on the user’s specifications. This ensures that newly constructed rigs are suitable for animating the corresponding 3D objects. [0351] Photogrammetry for Novel Views and Surfaces [0352] Camera Intrinsics and Extrinsics: Photogrammetry is used to capture detailed images of objects from multiple angles. The camera’s intrinsic parameters (e.g., focal length, sensor size) and extrinsic parameters (e.g., position, orientation) are meticulously calibrated to ensure accurate reconstruction of 3D objects. [0353] Data Generation for NeRF and Gaussian Splatter: The images captured through photogrammetry are processed to generate data understandable by neural radiance fields (NeRF) and Gaussian splatter techniques. This data is used to create novel views and surfaces with high levels of detail and realism. [0354] An example of a process of using photogrammetry to synthesize a target view of a 3D object is shown in Fig.10. [0355] Improved Angle Selection for Photogrammetry [0356] In some examples, AI tools select the best angles for capturing the 3D object based on image encoding scores and cosine similarity, functioning in a manner akin to Principal Component Analysis (PCA). An illustration of the use of improved (e.g., optimal) angle selection for synthetic photogrammetry in a process of generating a 3D model of a castle in shown in Figs. 11A-11D. [0357] Image Encoding Scores: Each image captured from different angles is encoded using a neural network, producing a high-dimensional feature vector that represents the visual content of the image. These feature vectors are the image encoding scores, capturing key visual information of the object from various perspectives. ACTIVE 708712445v1 58
Docket No. 212860-700001/PCT [0358] Cosine Similarity Measurement: In some examples, AI tools calculate the cosine similarity between the feature vectors of different images. Cosine similarity measures the cosine of the angle between two vectors, indicating their similarity in terms of visual content. A higher cosine similarity score suggests that the images capture similar features of the object. [0359] Analogous to PCA: The process of using image encoding scores and cosine similarity to select the best angles for photogrammetry is analogous to PCA. Like PCA, which identifies the principal components that capture variance (e.g., the most variance) in the data, some examples of AI tools select angles that provide diverse and informative views of the object (e.g., the most diverse and informative views of the object). This ensures a comprehensive representation of the object’s geometry and texture in the 3D reconstruction. Figs.25C-25E show examples of similarity level tests performed by a ControlNet model. [0360] Selection of Angles: Based on the cosine similarity scores, some examples of AI tools select angles (e.g., optimal angles) that offer distinct and informative perspectives of the object (e.g., the most distinct and informative perspectives of the object). These angles can ensure that the captured images collectively cover the object’s key features. [0361] Capturing Images: The camera is positioned at the selected angles to capture images of the object. These images are then used in the photogrammetry process to reconstruct the 3D object with high fidelity. [0362] By employing image encoding scores and cosine similarity in a manner akin to PCA, our system ensures that the images used for photogrammetry are captured from angles that provide valuable information (e.g., the most valuable information) for accurate 3D reconstruction. This approach enhances the quality of the final 3D object by ensuring comprehensive coverage of the object’s details from one or more informative perspectives (e.g., the most informative perspectives). [0363] It will be appreciated that these angle selection techniques can be used in connection with virtual photogrammetry (e.g., photogrammetry involving views of virtual objects) and/or in connection with physical photogrammetry (e.g., photogrammetry involving views of physical objects). [0364] Behavior Trees for Expert System [0365] Selectors, Actions, and Composites: Behavior trees can be used to structure the decision-making process of an expert system. Selectors can determine which action or sequence ACTIVE 708712445v1 59
Docket No. 212860-700001/PCT of actions to execute based on specific criteria. Actions are individual tasks performed by the system, such as selecting a tool or applying a texture. Composites combine multiple actions or selectors to form more complex behaviors. [0366] Decision Trees for Model Pipelines: Decision trees are used to build model pipelines and tool-forming jobs. They help in deciding the sequence of steps required to create a 3D object, from initial modeling to final texturing and animation. [0367] Conditioning the Model with Latent Space Manipulation [0368] Pre-Processing, Transforming, and Post-Processing: The generative process can involve conditioning the model through various stages. Pre-processing prepares the input data, transforming involves manipulating the latent spaces to achieve desired outcomes, and post- processing refines the generated 3D object. [0369] Diffusion Models: In some examples, diffusion models are used to generate high-quality 3D objects by gradually transforming a random noise distribution into a structured output. This process is guided by the conditioned latent spaces, facilitating alignment of the final model with the user’s specifications and creative vision. Some examples of results of conditioning a model with latent space manipulation are shown in Fig.13. [0370] Scalable AI infrastructure [0371] Fig. 15 is a block diagram of an example of a scalable AI infrastructure for 3D model generation. The AI Infrastructure is a sophisticated framework designed for advanced AI applications, particularly in 3D content creation and machine learning operations (MLOps). It features dockerized MLOps with GPU-accelerated images and headless UnrealEngine and Blender3D integration, ensuring efficient and automated 3D object generation processes. The infrastructure supports AI training and inference through both cloud and local nodes, specifically on consumer GPUs, offering scalable and flexible computing resources. It encompasses multi- modal databases that store embeddings, datasets, and models for various AI tasks. Additionally, a large synthetics database is included, providing extensive synthetic datasets and a massive collection of generic 3D base objects, essential for training AI models and conditioning content creation. [0372] In some embodiments, dockerized MLOps may provide GPU-accelerated image processing, headless Blender3D, work queues, etc. In some embodiments, AI training and inference nodes may include cloud and local nodes, consumer GPUs, etc. In some embodiments, ACTIVE 708712445v1 60
Docket No. 212860-700001/PCT multi-modal databases may store embeddings, datasets, models, etc. In some embodiments, a synthetics database may store large synthetic datasets and facilitate the provision of a massive generic 3D base objects server. [0373] 3D Generative Object Pipeline [0374] The AI has an ever growing base mesh database that it uses for fine tuning models and early conditioning in the generation process. Some examples of base meshes are shown in Fig. 12. The base meshes are neural stored with a unified multi-model embedding (text, audio, 3D, video, etc.), meaning all media encodes to the same intermediate representation. Together with a user query (or prompt) we find the nearest base objects required to kick off the generative pipeline. Fig.16 is a data flow diagram illustrating an example method for finding a base mesh that matches (e.g., most closely matches) a user query. [0375] Given the base objects and the user request, an autonomous task management agent searches for the best Tool Agent for the job. The initial tool agent may be the Blender Agent. Each tool agent has a decision tree that helps it navigate the complexity of a 3D pipeline. [0376] In some examples, advanced Conditioning is applied to one or more AI models. Meaning we generate extra information synthetically to help the AI models focus its inference budget on what matters. Fig.17 is a block diagram illustrating a system for conditioning a digital object (e.g., an image or 3D model). [0377] In some examples, cross-domain diffusion models are used. Such models can generate multi-view normal and color maps from single points of view. Fig.18 is a dataflow diagram illustrating an example method for generating multi-view normal and color maps from single points of view. The agents can enter into an iterative back and forth between synthetic and generative generation until they converge to a final result. [0378] In some examples, a blender model is used. Fig. 25B shows a configuration of a blender that includes a stable diffusion model and a ControlNet model. [0379] Fig. 19 shows illustrations of intermediate and final states of 3D models of various objects. As the final stage, the AI reconstructs a scene (similar to photogrammetry, 3D laser scan, etc.). Given the original control base objects and the virtual camera intrinsics and extrinsics used to generate the synthetic viewpoints, the system can perform state of the art 3D reconstruction, back on to professionally rigged objects. [0380] Textures and Materials Generation Pipeline ACTIVE 708712445v1 61
Docket No. 212860-700001/PCT [0381] In some examples, an expert system simulates the decision-making ability of a technical artist expert during a process of generating textures and materials. [0382] Figs. 20A-20C show a flowchart of a texture generation process. In particular, FIG. 20A shows a flowchart of a texture generation process 2000, FIG. 20B shows a flowchart of a normal map sub-process 2040 of the texture generation process 2000, and FIG.20C shows a flowchart of a segmentation map sub-process 2020 of the texture generation process 2000. In some embodiments, the texture generation process uses stable diffusion for the automatic generation of seamless and unique features conditioned by segmentation maps for single and/or multiple 3D objects. Additionally or alternatively, generation of features may be conditioned by normal maps and/or ambient occlusion textures. [0383] Fig. 20D shows a flowchart of a UV map node arrangement process for unconventional materials. Fig.20E shows a flowchart of another texture generation process. [0384] In some examples, the use of an expert system in-the-loop goes beyond the traditional human-in-the-loop model used for training models like ChatGPT 4, integrating the expertise of look development professionals directly into the AI tool’s learning and decision-making processes. By involving experts in look development, the AI models can gain an unprecedented depth of aesthetic and design intelligence, enabling them to handle complex visual tasks with a level of detail and accuracy unmatched in the industry. In some examples, the expert system in- the-loop guides the AI to operate with an understanding of visual artistry and technical precision. [0385] In some examples, an AI framework is structured as a computation graph, a network of interconnected nodes where each node represents a distinct computational operation or a step in an AI tool's learning process. This graph is intricately designed to integrate the insights from the expert system at every stage, ensuring that the AI tool's learning and decision-making are continually influenced by expert knowledge. [0386] Additional Examples [0387] Fig. 21 is a block diagram illustrating an example of some of the operations performed by a tool-forming 3D pipeline while generating a 3D model of a car. [0388] Fig. 22 shows examples of user interfaces of an AI tool for styling and upscaling a 3D model. [0389] In some examples, the latent space of one or more AI tools is exposed. The latent space is a multi-dimensional space where an AI tool's internal representations of data are stored. By ACTIVE 708712445v1 62
Docket No. 212860-700001/PCT making this space accessible at every node of the computation graph, we allow for an unprecedented level of transparency and control. This design enables both human experts and the LookDev-expert-system to fine-tune the AI tool's outputs by directly manipulating its internal representations. [0390] Fig. 23A shows an example of a user interface for a sampler tool. [0391] At each node, the latent space can be examined and adjusted. For instance, if an AI tool is processing visual data, experts in look development can intervene to adjust features related to texture, lighting, or color, refining the AI tool's output to align with professional standards and aesthetic considerations. Similarly, in processing textual data, language experts might tweak aspects related to semantics or style. [0392] Fig. 23B shows an example of a user interface for a 3D modeling tool. [0393] This system of continuous interaction with the latent space ensures that the AI tool's learning is not just a black-box process but a dynamic, interactive journey. It allows for iterative improvements and refinements, making the AI tool's outputs more precise, tailored, and aligned with expert-level standards. [0394] This approach creates a feedback loop where the AI tool not only learns from the initial expert input but also from the ongoing adjustments made in its latent space. This results in a model that is not only expert-driven but also continually evolving and improving, capable of adapting to new standards and insights in its field of application. [0395] Figs. 24A-24C show sequences of inputs to a 3D model generation tool and outputs from the tool during the generation and conditioning of a model of a sofa. Figs.24D-24F show sequences of inputs to a 3D model generation tool and outputs from the tool during the generation and conditioning of a model of a tiger head. Figs.24G-24I show sequences of inputs to a 3D model generation tool and outputs from the tool during the generation and conditioning of a model of a treasure chest. [0396] Fast Generative AI Tools [0397] Achieving high-performance game object generation in the system involves a combination of advanced techniques to increase efficiency and speed of operation and the use of efficient programming and processing tools. In some cases, such a process may include one or more of the following: ACTIVE 708712445v1 63
Docket No. 212860-700001/PCT [0398] Operation with Rust Programming Language: Rust is known for its high performance and safety, particularly in systems programming. By utilizing Rust, we ensure that the object generation pipeline is not only fast but also reliable and secure. Rust's efficiency in memory management and its ability to prevent runtime errors contribute significantly to the performance of the object generation process. [0399] Leveraging GStreamer for Media Processing: GStreamer, a powerful multimedia framework, is used for handling various media processing tasks. This tool is particularly effective for managing and manipulating audio and video files, which are integral parts of game objects. GStreamer's pipeline-based structure allows for high customization and optimization, making it an ideal choice for complex media operations required in game object generation. [0400] Porting Pre and Post Processing to the GPU: GPUs (Graphics Processing Unit) may be used for intensive computing tasks. By porting both pre-processing and post-processing game objects tasks to the GPU, we leverage its parallel processing capabilities. [0401] GPU Acceleration for Enhanced Performance: Alongside porting tasks to the GPU, we also implement GPU acceleration techniques. This involves optimizing some examples of AI tools and processes to take full advantage of the GPU's architecture. The Blender and UnrealEngine headless servers also make full use of GPU- acceleration inside their containers. [0402] These are a few examples of models and tools we've accelerated with Rust, GStreamer and GPU-tensors: Monocular Depth Prediction (for creating depth maps from streaming video input, including webcams); Semantic Segmentation (for creating real-time classification from streaming video input, including webcams); Motion Transfer (for real-time identification of body parts, tracking them in a video, and figuring out how they should move based on their shape and position); Face alignment (a Realtime complement for helping 2D models extract 3D face poses); Salient Object Detection (for finding in Realtime an important object (e.g., the most important object) to pay attention to). Terminology, Equivalents, and Additional Embodiments [0403] Embodiments have been described where the techniques are implemented in circuitry and/or computer-executable instructions. It should be appreciated that some embodiments may be in the form of a method, of which at least one example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be ACTIVE 708712445v1 64
Docket No. 212860-700001/PCT constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments. [0404] Various aspects of the embodiments described above may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments. [0405] Use of ordinal terms such as “first,” “second,” “third,” etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements. [0406] Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. [0407] The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any embodiment, implementation, process, feature, etc. described herein as exemplary should therefore be understood to be an illustrative example and should not be understood to be a preferred or advantageous example unless otherwise indicated. [0408] Unless otherwise noted, the terms “connected to” and “coupled to” (and their derivatives), as used in the specification and/or claims, are to be construed as permitting both direct and indirect (i.e., via other elements or components) connection. [0409] It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation, no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite ACTIVE 708712445v1 65
Docket No. 212860-700001/PCT articles “a” or “an” limits any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should typically be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, typically means at least two recitations, or two or more recitations). [0410] Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general, such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.” [0411] Having thus described several aspects of at least one embodiment, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the principles described herein. Accordingly, the foregoing description and drawings are by way of example only. ACTIVE 708712445v1 66
Claims
Docket No. 212860-700001/PCT CLAIMS What is claimed is: 1. A three-dimensional modeling method comprising: (a) obtaining, by one or more processors, a first plurality of encodings of a first plurality of images of an object; (b) generating, by the one or more processors and one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, by the one or more processors based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) – (c) until the one or more processors determine that the 3D model of the object satisfies one or more criteria. 2. The method of claim 1, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. 3. The method of claim 2, wherein the first 3D model of the object is obtained from a vector database based on a description of the object. 4. The method of claim 1, further comprising: obtaining, from a vector database based on a description of the object, the first plurality of images of the object. 5. The method of claim 1, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; ACTIVE 708712445v1 67
Docket No. 212860-700001/PCT rendering, from a set of viewpoints of the first 3D model of the object, a set of views of the object; calculating a plurality of scores based on encodings of views in the set of views, wherein the scores indicate an amount of similarity between or among groups of two or more views in the set of views; and obtaining, based on the plurality of scores, the first plurality of images of the object. 6. The method of claim 5, wherein obtaining the first plurality of images comprises: selecting, based on the plurality of scores, a subset of the set of views, wherein the first plurality of images is the selected subset of the set of views. 7. The method of claim 5, wherein the set of views is a first set of views, the set of viewpoints is a first set of viewpoints, and obtaining the first plurality of images of the object comprises: determining, based on the plurality of scores, a second set of viewpoints of the object; and rendering, by the one or more processors, a second set of views of the first 3D model of the object from the second set of viewpoints, wherein the first plurality of images is the second set of views. 8. The method of claim 5, wherein: the encodings of the views in the set of views are embeddings of the views in the set of views, and the amount of similarity between or among a respective group of two or more views is determined based on cosine similarity between the embeddings of respective views of the two or more views. 9. The method of claim 5, wherein a total number of images in the first plurality of images is less than a total number of views in the set of views. ACTIVE 708712445v1 68
Docket No. 212860-700001/PCT 10. The method of claim 2, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings comprises: conditioning an image generation process performed in a latent space of the one or more image-generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. 11. The method of claim 10, wherein the conditioning data include a description of the object. 12. The method of claim 11, wherein the description of the object includes a description of one or more geometric attributes of the object, one or more visual attributes of the object, and/or one or more optical attributes of the object. 13. The method of claim 10, wherein the conditioning data include a description of one or more alterations to (1) an aesthetic of the object as depicted in the first plurality of images or represented in the first 3D model of the object, (2) a geometric attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object, (3) a visual attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object, and/or (4) an optical attribute of the object as depicted in the first plurality of images or represented in the first 3D model of the object. 14. The method of claim 10, wherein the generating the second plurality of embeddings further comprises providing, by the one or more processors, the first plurality of embeddings as input to the one or more image-generating models. 15. The method of claim 14, wherein the one or more image-generating models comprise a latent, text-to-image diffusion model. 16. The method of claim 1, wherein the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model. ACTIVE 708712445v1 69
Docket No. 212860-700001/PCT 17. The method of claim 1, wherein one or more visual attributes and/or optical attributes of the object includes a texture, material, shading, lighting, reflectivity, and/or color of a surface of the object. 18. The method of claim 1, wherein the one or more criteria include (1) receipt of user input indicating that the 3D model of the object is satisfactory, (2) receipt of user input requesting termination of modeling, (3) expiry of a maximum time period allocated for the modeling, and/or (4) use of a maximum amount of computational resources allocated for the modeling. 19. At least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: (a) obtaining a first plurality of encodings of a first plurality of images of an object; (b) generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) – (c) until the at least one processor determines that the 3D model of the object satisfies one or more criteria. 20. A system comprising: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method comprising: (a) obtaining a first plurality of encodings of a first plurality of images of an object; ACTIVE 708712445v1 70
Docket No. 212860-700001/PCT (b) generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; (c) constructing, based on the second plurality of encodings, a 3D model of the object, the 3D model indicating (1) a geometric form of the object and (2) a texture and/or material of at least one surface of the object; and repeating steps (a) – (c) until the at least one processor determines that the 3D model of the object satisfies one or more criteria. 21. A three-dimensional modeling method comprising: obtaining, by one or more processors, a plurality of encodings of a first plurality of images of an object; calculating, by the one or more processors, a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; obtaining, by the one or more processors and based on the plurality of scores, a second plurality of images; and reconstructing, by the one or more processors and based on the second plurality of images, a three-dimensional (3D) model of the object. 22. The method of claim 21, wherein the first plurality of images depict a plurality of views of the object, and wherein the object is a virtual 3D object or a physical 3D object. 23. The method of claim 21, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object comprises: rendering, by the one or more processors, a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views. 24. The method of claim 23, wherein obtaining the second plurality of images comprises: ACTIVE 708712445v1 71
Docket No. 212860-700001/PCT selecting, by the one or more processors and based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images. 25. The method of claim 23, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object comprises: determining, by the one or more processors and based on the plurality of scores, a second plurality of viewpoints of the object; and rendering, by the one or more processors, a second plurality of views of the first 3D model of the object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views. 26. The method of claim 23, wherein the obtaining the first plurality of images of the object further comprises: obtaining, by the one or more processors, the first 3D model of the object. 27. The method of claim 26, wherein obtaining the first 3D model of the object comprises: generating, by the one or more processors and one or more image-generating models, one or more views of the object based on a description of the object, wherein the first 3D model of the object is reconstructed based on at least a subset of the one or more generated views. 28. The method of claim 21, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images. ACTIVE 708712445v1 72
Docket No. 212860-700001/PCT 29. The method of claim 21, wherein a total number of the second plurality of images is less than a total number of the first plurality of images. 30. At least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method comprising: obtaining a plurality of encodings of a first plurality of images of an object; calculating a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; obtaining, based on the plurality of scores, a second plurality of images a second plurality of images based on scores for each image of the first plurality of images; and reconstructing, based on the second plurality of images, a three-dimensional (3D) model of the object. 31. The at least one computer-readable storage medium of claim 30, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object comprises: rendering a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views. 32. The at least one computer-readable storage medium of claim 31, wherein obtaining the second plurality of images comprises: selecting, based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images. 33. The at least one computer-readable storage medium of claim 31, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object comprises: determining, based on the plurality of scores, a second plurality of viewpoints of the object; and ACTIVE 708712445v1 73
Docket No. 212860-700001/PCT rendering a second plurality of views of the first 3D model of the object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views. 34. The at least one computer-readable storage medium of claim 30, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images. 35. The at least one computer-readable storage medium of claim 34, wherein a total number of the second plurality of images is less than a total number of the first plurality of images. 36. A system comprising: at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method comprising: obtaining a plurality of encodings of a first plurality of images of an object; calculating a plurality of scores based on the plurality of encodings, wherein each score indicates an amount of similarity between or among a respective group of two or more images of the first plurality of images; obtaining, based on the plurality of scores, a second plurality of images a second plurality of images based on scores for each image of the first plurality of images; and reconstructing, based on the second plurality of images, a three-dimensional (3D) model of the object. 37. The system of claim 36, wherein the 3D model of the object is a second 3D model, and wherein obtaining the first plurality of images of the object comprises: ACTIVE 708712445v1 74
Docket No. 212860-700001/PCT rendering a plurality of views of a first 3D model of the object from a plurality of viewpoints, wherein the first plurality of images include the plurality of views. 38. The system of claim 37, wherein obtaining the second plurality of images comprises: selecting, based on the plurality of scores, a subset of the first plurality of images, wherein the second plurality of images is the selected subset of the first plurality of images. 39. The system of claim 37, wherein the plurality of views is a first plurality of views, the plurality of viewpoints is a first plurality of viewpoints, and obtaining the second plurality of images of the object comprises: determining, based on the plurality of scores, a second plurality of viewpoints of the object; and rendering a second plurality of views of the first 3D model of the object from the second plurality of viewpoints, wherein the second plurality of images is the second plurality of views. 40. The system of claim 36, wherein: the plurality of encodings of the first plurality of images are a plurality of embeddings of the first plurality of images, and the amount of similarity between or among a respective group of two or more images is determined based on cosine similarity between the embeddings of respective pairs of the two or more images. 41. A three-dimensional modeling method comprising: obtaining, by one or more processors, a first plurality of encodings of a first plurality of images of an object; generating, by the one or more processors and one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, by the one or more processors based on the second plurality of encodings, a geometric form of a 3D model of the object. ACTIVE 708712445v1 75
Docket No. 212860-700001/PCT 42. The method of claim 41, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. 43. The method of claim 42, wherein the first 3D model of the object is obtained from a vector database based on a description of the object. 44. The method of claim 41, further comprising: obtaining, from a vector database based on a description of the object, the first plurality of images of the object. 45. The method of claim 41, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings comprises: conditioning an image generation process performed in a latent space of the one or more image-generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. 46. The method of claim 45, wherein the conditioning data include a description of the object and/or a description of one or more geometric attributes of the object. 47. The method of claim 45, wherein the generating the second plurality of embeddings further comprises providing, by the one or more processors, the first plurality of embeddings as input to the one or more image-generating models. 48. The method of claim 47, wherein the one or more image-generating models comprise a latent, text-to-image diffusion model. ACTIVE 708712445v1 76
Docket No. 212860-700001/PCT 49. The method of claim 41, wherein the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model. 50. The method of claim 41, wherein the constructing the geometric form of the 3D model of the object comprises: determining whether the geometric form of the 3D model of the object satisfies one or more criteria. 51. At least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method comprising: obtaining a first plurality of encodings of a first plurality of images of an object; generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, based on the second plurality of encodings, a geometric form of a 3D model of the object. 52. The at least one computer-readable storage medium of claim 51, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. 53. The at least one computer-readable storage medium of claim 52, further comprising: obtaining, from a vector database based on a description of the object, the first 3D model of the object or the first plurality of images of the object. 54. The at least one computer-readable storage medium of claim 51, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a ACTIVE 708712445v1 77
Docket No. 212860-700001/PCT second plurality of embeddings, and the generating the second plurality of embeddings comprises: conditioning an image generation process performed in a latent space of the one or more image-generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. 55. The at least one computer-readable storage medium of claim 51, wherein: the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model; and the constructing the geometric form of the 3D model of the object comprises: determining whether the geometric form of the 3D model of the object satisfies one or more criteria. 56. A system comprising: at least one processor; and at least one storage medium having encoded thereon executable instruc ons that, when executed by the at least one processor, cause the at least one processor to carry out a method comprising: obtaining a first plurality of encodings of a first plurality of images of an object; generating, by one or more image-generating models, a second plurality of encodings of a second plurality of images of the object, the second plurality of encodings being based on the first plurality of encodings and on conditioning data; and constructing, based on the second plurality of encodings, a geometric form of a 3D model of the object. 57. The system of claim 56, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; and rendering, from a plurality of viewpoints of the first 3D model of the object, the first plurality of images of the object. ACTIVE 708712445v1 78
Docket No. 212860-700001/PCT 58. The system of claim 57, further comprising: obtaining, from a vector database based on a description of the object, the first 3D model of the object or the first plurality of images of the object. 59. The system of claim 56, wherein the first plurality of encodings is a first plurality of embeddings, the second plurality of encodings is a second plurality of embeddings, and the generating the second plurality of embeddings comprises: conditioning an image generation process performed in a latent space of the one or more image-generating models on the first plurality of embeddings and on one or more embeddings obtained from a vector database based on the conditioning data. 60. The system of claim 56, wherein: the geometric form of the 3D model includes one or more shapes, dimensions, and/or orientations of one or more portions of the 3D model; and the constructing the geometric form of the 3D model of the object comprises: determining whether the geometric form of the 3D model of the object satisfies one or more criteria. 61. A three-dimensional modeling method comprising: obtaining, by one or more processors, a first encoding of a first image; generating, by the one or more processors and one or more image-generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping, by the one or more processors, the second image to a surface of a three- dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. 62. The method of claim 61, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; and ACTIVE 708712445v1 79
Docket No. 212860-700001/PCT rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view. 63. The method of claim 62, wherein the first 3D model of the object is obtained from a vector database based on a description of the object. 64. The method of claim 61, wherein the first image is obtained from a vector database based on a description of the texture and/or material. 65. The method of claim 61, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding comprises: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on the conditioning data. 66. The method of claim 65, wherein the conditioning data include a description of one or more visual attributes and/or optical attributes of the object. 67. The method of claim 66, wherein the description of the one or more visual attributes and/or optical attributes of the object includes a description of a texture, material, shading, lighting, reflectivity, and/or color of a surface of the object. 68. The method of claim 65, wherein the generating the second embedding comprises providing, by the one or more processors, the first embedding as input to the one or more image- generating models. 69. The method of claim 68, wherein the one or more image-generating models comprise a latent, text-to-image diffusion model. 70. The method of claim 61, further comprises: determining whether the texture and/or material of the surface of the 3D model satisfy one or more criteria. ACTIVE 708712445v1 80
Docket No. 212860-700001/PCT 71. At least one computer-readable storage medium encoded with computer-executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method comprising: obtaining a first encoding of a first image; generating, by one or more image-generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. 72. The at least one computer-readable storage medium of claim 71, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view. 73. The at least one computer-readable storage medium of claim 71, wherein the first image is obtained from a vector database based on a description of the texture and/or material. 74. The at least one computer-readable storage medium of claim 71, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding comprises: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on an embedding obtained from a vector database based on the conditioning data. 75. The at least one computer-readable storage medium of claim 71, further comprises: determining whether the texture and/or material of the surface of the 3D model satisfy one or more criteria. 76. A system comprising: ACTIVE 708712445v1 81
Docket No. 212860-700001/PCT at least one processor; and at least one storage medium having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method comprising: obtaining a first encoding of a first image; generating, by one or more image-generating models, a second encoding of a second image, the second encoding being based on the first encoding and on conditioning data; and mapping the second image to a surface of a three-dimensional (3D) model of an object, thereby defining a texture and/or material of the surface of the 3D model. 77. The system of claim 76, wherein the 3D model of the object is a second 3D model of the object, the method further comprising: obtaining a first 3D model of the object; and rendering, from a viewpoint of the first 3D model of the object, a view of the surface of the model, wherein the first image includes at least a portion of the view. 78. The system of claim 76, wherein the first image is obtained from a vector database based on a description of the texture and/or material. 79. The system of claim 76, wherein the first encoding is a first embedding, the second encoding is a second embedding, and the generating the second embedding comprises: conditioning an image generation process performed in a latent space of the one or more image-generating models on the embedding and on an embedding obtained from a vector database based on the conditioning data. ACTIVE 708712445v1 82
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463566361P | 2024-03-17 | 2024-03-17 | |
| US63/566,361 | 2024-03-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025199044A1 true WO2025199044A1 (en) | 2025-09-25 |
Family
ID=97140247
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/020256 Pending WO2025199044A1 (en) | 2024-03-17 | 2025-03-17 | Systems and methods for ai-assisted construction of three-dimensional models |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025199044A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230154105A1 (en) * | 2016-08-15 | 2023-05-18 | Packsize Llc | System and method for three-dimensional scanning and for capturing a bidirectional reflectance distribution function |
| US20240005604A1 (en) * | 2022-05-19 | 2024-01-04 | Nvidia Corporation | Synthesizing three-dimensional shapes using latent diffusion models in content generation systems and applications |
-
2025
- 2025-03-17 WO PCT/US2025/020256 patent/WO2025199044A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20230154105A1 (en) * | 2016-08-15 | 2023-05-18 | Packsize Llc | System and method for three-dimensional scanning and for capturing a bidirectional reflectance distribution function |
| US20240005604A1 (en) * | 2022-05-19 | 2024-01-04 | Nvidia Corporation | Synthesizing three-dimensional shapes using latent diffusion models in content generation systems and applications |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Foo et al. | Ai-generated content (aigc) for various data modalities: A survey | |
| US11893763B2 (en) | Generating modified digital images utilizing a global and spatial autoencoder | |
| US10176404B2 (en) | Recognition of a 3D modeled object from a 2D image | |
| Shen et al. | Clipgen: A deep generative model for clipart vectorization and synthesis | |
| JP2020115337A (en) | Set of neural networks | |
| JP2019032820A (en) | Data set for learning functions with image as input | |
| Wang et al. | 3D human motion editing and synthesis: A survey | |
| JP2022036024A (en) | Neural network for outputting 3d model converted to parameter | |
| JP2022036023A (en) | Variation auto encoder for outputting 3d model | |
| US20220114289A1 (en) | Computer architecture for generating digital asset representing footwear | |
| Nazarieh et al. | A survey of cross-modal visual content generation | |
| JP7457211B2 (en) | Computing platform for facilitating augmented reality experiences with third party assets | |
| Han et al. | Attribute-sentiment-guided summarization of user opinions from online reviews | |
| Herrmann et al. | Accelerating statistical human motion synthesis using space partitioning data structures | |
| CN118762142A (en) | Text-driven three-dimensional model generation method, system, device and medium | |
| Sui et al. | A survey on human interaction motion generation | |
| KR100898991B1 (en) | Apparatus for shader providing and transformation of 3d graphic system | |
| CN120472082A (en) | Learning continuous control for 3D-aware image generation on a text-to-image diffusion model | |
| WO2025199044A1 (en) | Systems and methods for ai-assisted construction of three-dimensional models | |
| Zhan et al. | CharacterMixer: Rig‐Aware Interpolation of 3D Characters | |
| US11972534B2 (en) | Modifying materials of three-dimensional digital scenes utilizing a visual neural network | |
| Wu et al. | Contrastive disentanglement for self-supervised motion style transfer | |
| WO2021203076A1 (en) | Method for understanding and synthesizing differentiable scenes from input images | |
| Chen et al. | CSG-based ML-supported 3D translation of sketches into game assets for game designers | |
| Fukaya | User-centred artificial intelligence for game design and development with GAGeTx: Graphical Asset Generation and Transformation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25774436 Country of ref document: EP Kind code of ref document: A1 |