WO2025126150A1 - Débruitage de modèles probabilistes de diffusion pour prédiction d'anatomie post-traitement dans des soins buccaux numériques - Google Patents
Débruitage de modèles probabilistes de diffusion pour prédiction d'anatomie post-traitement dans des soins buccaux numériques Download PDFInfo
- Publication number
- WO2025126150A1 WO2025126150A1 PCT/IB2024/062651 IB2024062651W WO2025126150A1 WO 2025126150 A1 WO2025126150 A1 WO 2025126150A1 IB 2024062651 W IB2024062651 W IB 2024062651W WO 2025126150 A1 WO2025126150 A1 WO 2025126150A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- patient
- representations
- dentition
- treatment
- teeth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61C—DENTISTRY; APPARATUS OR METHODS FOR ORAL OR DENTAL HYGIENE
- A61C7/00—Orthodontics, i.e. obtaining or maintaining the desired position of teeth, e.g. by straightening, evening, regulating, separating, or by correcting malocclusions
- A61C7/002—Orthodontic computer assisted systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- Patent Applications is incorporated herein by reference: 63/432,627; 63/366,492; 63/366,495; 63/352,850; 63/366,490; 63/366,494; 63/370,160; 63/366,507; 63/352,877; 63/366,514; 63/366,498; 63/264,914; 63/609581; 63/588028; 63/609938; and 63/432,627.
- the present disclosure describes systems and techniques for training and using one or more flowbased ML models (e.g., normalizing flows or denoising diffusion models) to generate 2D or 3D representations of a patient’s predicted post-treatment anatomy.
- flow -based ML models include denoising diffusion probabilistic models (e.g., where denoising is performed by trained neural networks such as U-Nets or other encoder-decoder structures).
- Denoising diffusion-based techniques are described which combine a representation of the patient’s post-treatment target dentition (e.g., 3D meshes in final setup or final occlusion poses) with a representation of the patient’s anatomy (e.g., a 2D photo or a 3D scan of the face).
- U-Nets are an example of models that may enable improvements to data precision.
- Other hierarchical neural network feature extraction models can, alternatively, be used to perform denoising operations.
- Techniques of this disclosure include methods of a generating a data structure describing the posttreatment anatomy of a patient (e.g., a predicted smile 530).
- One or more oral care arguments 502 may be provided to such methods.
- the one or more oral care argument 502 may describe the target output of a trained machine learning model.
- Training methods such as methods 222, 500 or 800 may generate one or more noisy representations of an intended output, and then use the series of one or more noisy representations to train a denoising diffusion ML model 522 to perform a denoising operation (e.g., to remove noise from initially noisy representations).
- the output of the fully trained denoising ML model 320 may include one or more generated denoised representations, which may be used to define one or more aspects of a patient’s post-treatment anatomy 324 (e.g., the appearance of the patient’s face in combination with the target dentition), or may be used as a part of one or more other digital oral care treatments.
- a training dataset may be generated or refined by successively modifying one or more 2D or 3D representations of the patient’s dentition (e.g., the post-treatment dentition), or of the patient’s face (e.g., a 2D photo with a randomized mask applied to remove aspects of the face which include the mouth).
- a partially trained denoising ML model 214 e.g., a UNet
- HNNFEM hierarchical neural network feature extraction module
- the resulting fully trained denoising ML model 320 may be deployed for clinical treatment of patients (e.g., to show the patient a prediction of what they’ll look like after the completion of treatment, and do so in approximately real-time).
- one or more representations of the training dataset may be modified, such as through the addition of noise (e.g., Gaussian noise).
- noise e.g., Gaussian noise
- salt-and-pepper noise may be added to a masked photograph of the patient (or an image of the patient’s post-treatment dentition), resulting in hundreds, thousands or tens or thousands of incrementally noisier images.
- one or more aspects of one or more representations of the training dataset may be encoded into one or more latent representations, and then noise may be added to the one or more latent representations, to produce one or more noisy latent representations.
- the one or more oral care arguments may include at least one of a real value, a categorical value or a natural language text value.
- the one or more oral care arguments may include at least one of an oral care metric, or an oral care parameter.
- the partially trained denoising ML model 214, or fully trained denoising ML model 320 may include one or more neural networks (e.g., an encoder-decoder structure, etc.).
- An encoder-decoder structure may comprise at least one encoder or at least one decoder.
- Non-limiting examples of an encoder-decoder structure include a UNet, a transformer, a pyramid encoder-decoder, or an autoencoder, among others.
- One or more 3D representations of the patient’s dentition may be provided to the denoising diffusion methods of this disclosure.
- the one or more 3D representations of the patient’s dentition may be encoded into one or more latent representations.
- the one or more denoised representations e.g., 3D oral care representations generated by fully trained diffusion ML model 320
- the denoising diffusion techniques described herein may, in some instances, be practiced in combination with other operations in digital oral care.
- a patient’s dentition may be scanned in a clinical environment (or at a clinical context), resulting in a pre-segmentation mesh of the teeth and gums.
- the mesh may undergo validation to identify any scanning defects, undergo mesh cleanup to fill holes and correct flaws, and then undergo segmentation.
- the segmented tooth meshes and corresponding maloccluded transforms may be provided to an automated orthodontic setup prediction model, which may generate a predicted final setup.
- the predicted final setup (e.g., comprising 3D tooth meshes which are placed in their final setup poses) may describe the patient’s target dentition 302.
- the target dentition 302 may subsequently be provided to the smile prediction methods of this disclosure, which may combine the digital representation of the patient’s target dentition 302 (e.g., either 2D or 3D) with a digital representation of the patient’s face 300 (e.g., either 2D or 3D) using fully trained denoising diffusion ML model 320.
- the smile prediction methods of this disclosure may combine the digital representation of the patient’s target dentition 302 (e.g., either 2D or 3D) with a digital representation of the patient’s face 300 (e.g., either 2D or 3D) using fully trained denoising diffusion ML model 320.
- Other combinations of the techniques described herein should also be considered within the scope of this disclosure.
- Techniques of this disclosure relate to a computer-implemented method for generating predictions of post-treatment dental anatomy for a patient.
- Representations of the patient's pre-treatment dental anatomy, which include exposed teeth, as well as representations of the desired post-treatment dentition may be provided to the techniques.
- the techniques may generate noisy first representations of the intended output, which may be denoised by using a trained flow-based machine learning model (e.g., aa denoising diffusion ML model). This process yields second representations of the intended output and automatically defines various aspects of the patient’s post-treatment anatomy.
- Techniques of this disclosure may optimize camera parameters which may be used to register two or more dentitions, to prepare those dentitions to be provided to ML models of this disclosure.
- the optimization methods may optimize the alignment of the upper and lower 3D arches of the patient’s dentition (e.g., by optimizing translation or orientation values that control the alignments of the upper and lower arches).
- the camera parameters can also be optimized (e.g., using a genetic algorithm, etc.). Either camera parameters, or arch alignments (or both) may be optimized, according to particular implementations.
- the methods further incorporate the use of oral care arguments and camera parameters to enhance the accuracy of the predictions.
- the techniques include steps for encoding pre-treatment and post-treatment representations of patient dentitions into latent representations, processing 2D images and 3D representations of teeth, and utilizing tooth image masks for refining the predicted shapes of teeth. Additionally, the methods employ optimization techniques to refine camera parameters and cost functions to ensure the precision of the generated data structures (e.g., 3D meshes or 2D images of post-treatment patient appearances).
- a posttreatment appearance can include aspects of the full face, partial face, and/or dentition of the patient.
- the methods may automatically generate orthodontic setups or tooth restoration designs using machine learning models and provides functionality for determining confidence scores to validate tooth identifications.
- the methods offers a comprehensive and efficient approach to predicting a patient’s posttreatment dental anatomy, thereby improving the quality of dental care and treatment planning.
- FIG. 1 shows a method for preparing the patient’s face data and the patient’s final setup dentition to be used in training a denoising diffusion probabilistic model for smile prediction.
- FIG. 2 shows a method for training a denoising diffusion probabilistic model to combine a representation of the patient’s post-treatment dental anatomy with a representation of the patient’s face.
- FIG. 3 shows a method for using a fully trained and deployed denoising diffusion probabilistic model to combine a representation of the patient’s post treatment dental anatomy with a representation of the patient’s face.
- FIG. 4 shows a method of generating a color palette which can be used to influence the color of one or more teeth in the predicted smile.
- FIG. 5A-1 shows a method of training a flow-based model to generate a predicted smile (with images that show an example of dental restorative treatment).
- FIG. 5A-2 shows a method of training a flow-based model to generate a predicted smile.
- FIG. 5B show images with an example of predicting a smile using method for orthodontic treatment.
- FIG. 6 shows a method of using a fully trained flow-based model to generate a predicted smile for orthodontic treatment.
- FIG. 7 shows a method of using a fully trained flow-based model to generate a predicted smile for dental restorative treatment.
- FIG. 8 shows a method of training a denoising diffusion ML model.
- FIG. 9 shows a method of using a fully trained denoising diffusion ML model.
- FIG. 10 shows a method of preparing a patient’s dentition for registration.
- FIG. 11 shows a method of preparing a patient’s dentition for registration.
- FIG. 12 shows a method of segmenting a patient’s dentition.
- FIG. 13 shows a method of generating a smile mask.
- FIG. 14 shows a method of applying smile mask to a patient’s dentition.
- FIG. 15 shows a method of applying smile mask to a patient’s dentition.
- FIG. 16 shows a method of applying smile mask to a patient’s dentition.
- FIG. 17 shows a 2D image of a patient’s smile, and an associated segmentation mask.
- FIG. 18 shows a 3D mesh of the patient’s dentition, a smile mask and a 2D projection of the patient’s dentition.
- FIG. 19 shows subset mask images that were generated using initial or optimized camera parameters.
- FIG. 20 shows 2D dentition masks.
- FIG. 21 shows patient dentitions before and after registration is performed.
- FIG. 22 shows a method of computing the fitness of a population member in a genetic algorithm.
- Diffusion models may be applied to 2D image generation, or the generation of 3D representations, among others.
- such implementations may take input from natural language text (e.g., natural language text, real values, categorical values, reference images or reference 3D representations which describe an intended outcome or a post-treatment anatomy).
- natural language text e.g., natural language text, real values, categorical values, reference images or reference 3D representations which describe an intended outcome or a post-treatment anatomy.
- the techniques described herein expand denoising diffusion models into the digital oral care space.
- the techniques described herein use a denoising diffusion model to combine a patient’s post-treatment dental anatomy with a representation of the patient’s face.
- the combining may be conditioned, at least in part, on provided oral care arguments (e.g., which may include natural language text, integer arguments, real-valued arguments, categorical arguments, and the like).
- Such oral care arguments may contain one or more attributes describing an intended output from a trained machine learning model.
- Techniques described herein may, in some instances, take as input representations of the patient’s post-treatment dentition (e.g., 3D point clouds, or 2D views of 3D representations), which are to be used as guides for the generation of one or more post-treatment renderings of the patient’s face.
- the renderings can be used in treatment planning by clinicians and can help patients decide what kind of treatment they’d like to receive.
- Techniques described herein may, in some instances, take as input representations of the patient’s face, or of the patient’s post-treatment target dentition.
- these inputs may be encoded into latent representations by autoencoders or by other encoder-decoder neural network structures (e.g., a representation generation module).
- a latent representation may include an information-rich and/or reduced-dimensionality form of the original data.
- techniques of this disclosure may realize improved data precision through the use of such latent representations.
- denoising diffusion ML models of this disclosure e.g., model 214, model 320, model 522, model 608, and model 708, etc.
- the denoising diffusion ML models are better able to learn the distribution of the data within the latent representations because of the smaller size of the latent representations (e.g., relative to the size of the data in the data’s original form).
- Method 132 in FIG. 1 describes the preparation of tuples of patient data (with optional augmentation) which may be used by method 222 in FIG. 2 to train a denoising diffusion probabilistic model 214.
- FIG. 3 shows method 326 which uses a fully trained denoising ML model 320to combine a photo (or mesh) of the patient’s face with a digital representation of a predicted post-treatment dentition (e.g., to generate one or more predicted smiles).
- the training of a partially trained denoising diffusion model 214 may involve a forward pass 220 over the input data.
- the input data may include text-based oral care arguments 200 (e.g., instructions for clinicians), non-text oral care arguments 202, or one or more tuples 204 (e.g., a photo of the patient’s face, a masked photo of the patient’s face, and/or a latent representation of the patient’s target dentition, among other inputs described herein).
- optional augmentation may be applied to one or more fields of the tuple 204, wherein the tuple may contain (e.g., an augmented photo of the patient’s face, an augmented masked photo of the patient’s face, or a latent representation of the augmented patient’s dentition).
- the forward pass 220 may add small amounts of noise to the tuple data 204, generating a Markov chain of steps 216.
- the inputs 200, 202, or 204 may be combined (212) and then provided to the partially trained denoising ML model 214, to condition the model 214 on the patient’s data and/or the treatment instructions.
- Text-based oral care arguments 200, or non-text oral care argument 202 may include treatment instructions from clinicians.
- the forward pass 220 may generate training data.
- the Markov chain may comprise a set of successively noisier training data examples.
- Partially trained denoising diffusion ML model 214 may be trained generate noise tensors which may be subtracted from inputs, in order to perform the denoising operation.
- Input oral care arguments 202 may influence the functioning of the denoising diffusion models 214 or 320, causing the denoising diffusion models 214 or 320 to generate output to the specification of the clinician (e.g., enabling the customization of the output that is generated by the denoising diffusion models 214 or 320).
- Oral care arguments 202 may include oral care parameters, oral care metrics, among other examples described herein.
- Oral care argument 202 may include: categorical information such as patient age or gender, or other embeddings containing medical data (e.g., pertaining to diagnoses, etc.).
- Text-based oral care arguments 200 may contain natural/colloquial language embeddings.
- the masked representation of the patient’s face may be encoded (210) into latent form.
- a latent encoding module may be trained to encode data into a reduced-dimensionality latent form.
- Examples of data which may be encoded include one or more 3D point clouds or 3D meshes describing the patient’s dentition, 2D photos of the patient’s face, 3D representations of the patient’s face, or 2D renderings of the patient’s dentition, among others. Such data may be encoded into a latent vector or latent capsule. Oral care arguments 200 may be encoded (206) in latent representations. Oral care arguments 202 may be encoded (208) into latent representations. The inputs 200, 202 or 204 (or latent representations thereof) may be combined (212) and then be used to condition the output of the partially trained denoising ML model 214.
- the one or more noisy representations of the patient’s face may be provided to the partially trained denoising diffusion ML model 214, which may generate a predicted representation of the patient’s post-treatment appearance. Loss may be computed (218) between the predicted representation and a corresponding ground truth representation (e.g., the original photo of the patient, which is provided as a part of the tuple 204). The loss may be used to further train, at least in part, the partially trained denoising diffusion ML model 214. In some implementations, partially trained denoising ML model 214 may generate one or more predicted noise tensors. The one or more predicted noise tensors may be subtracted from the inputs.
- the predicted noise tensors may be removed for the inputs, or the inputs may be denoised.
- loss may be computed as the difference between the predicted noise tensors and corresponding ground truth noise tensors which are provided by the Markov chain 216.
- the Markov chain 216 may generate a succession of increasingly noisy versions of the input data 204, which may then be used to train, at least in part, a denoising ML module 214 (e.g., which may be used in deployment as a part of the reverse pass 322).
- the denoising ML model 322 which is trained for use in reverse pass 322 may include one or more neural networks (e.g., U-Net, VAE, 3D SWIN transformer, pyramid encoderdecoder, or the like) and may be trained to denoise a highly noisy version of the data from the Markov chain 318.
- the denoising ML model 320 may iteratively remove noise from a randomized data structure (e.g., a 2D image containing Gaussian noise or a 3D point cloud or mesh with randomized mesh elements). Stated another way, the initially noisy data structure may be denoised by fully trained denoising diffusion ML model 320 until the data structure converges on a final state, and is output as a predicted smile 324 (e.g., a predicted post-treatment photo of the patient). In deployment, a pre-treatment 2D photo (or 3D representation) 300 of the patient’s face may undergo latent encoding (308) and be provided to a combination module 316.
- a randomized data structure e.g., a 2D image containing Gaussian noise or a 3D point cloud or mesh with randomized mesh elements.
- the initially noisy data structure may be denoised by fully trained denoising diffusion ML model 320 until the data structure converges on a final state, and is output as a predicted smile 324 (
- a digital representation (either 3D representation or 2D representation) of the patient’s predicted post-treatment target dentition 302 may undergo latent encoding (310) or (118), and then be provided to combination module 316.
- Text-based oral care arguments 304 may undergo latent encoding (312), and be provided to combination module 316.
- non-text based oral care arguments 306 may undergo latent encoding (316) and be provided to the combination module 316.
- the combination module may perform concatenations or additive operations on its inputs and provide its output to the fully trained denoising ML model 320, to condition that model on the patient’s data and/or the treatment instructions from clinicians.
- a loss function (e.g., cross-entropy or mean squared error (MSE), among others) may be computed (218) to quantify the differences between a generated 3D oral care representation and a corresponding ground truth (or reference) 3D oral care representation.
- MSE mean squared error
- the loss function may be used to train, at least in part, the denoising ML model 214.
- the forward pass 220 of the denoising diffusion model may generate a training dataset of increasingly noisy examples of the input data.
- Noise may be introduced to disfigure the input data (e.g., 2D or 3D representations of the patient’s face and/or post-treatment dentition, etc.) and those noisy examples may be used, at least in part, to train a denoising diffusion machine learning model 214 to reverse of this noise-introducing process (e.g., in model deployment).
- the reverse pass 322 may be executed to reconstruct the pristine input data by removing the noise from a noisy example of that input data.
- the denoising diffusion model is capable to generate new 3D oral care representations (e.g., a photo of the patient with post-treatment dentition, among others) by passing a noisy data example (e.g., a photo of the patient with the mouth area masked-out) through the denoising diffusion process 320 (aka the reverse process 322).
- a noisy data example e.g., a photo of the patient with the mouth area masked-out
- Examples of hierarchical neural network feature extraction modules include 3D SWIN Transformer architectures, U-Nets or pyramid encoder-decoders, among others.
- a HNNFEM may be trained to generate multi-scale voxel (or point) embeddings of a 3D representation (or multi-scale embeddings of other mesh elements described herein).
- a HNNFEM of one or more layers (or levels) may be trained on 3D representations of patient dentitions to generate neural network feature embeddings which encompass global, intermediate or local aspects of the 3D representation of the patient’s dentition.
- a cohort patient case may include a set of tooth crown meshes, a set of tooth root meshes, a photograph of the patient (e.g., with pre-treatment dentition), a representation of a posttreatment predicted dentition, or a data file containing attributes of the case (e.g., a JSON file).
- the post-treatment dentition e.g., a final setup
- a typical example of a cohort patient case may contain up to 32 crown meshes (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces), up to 32 root meshes (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces), multiple gingiva mesh (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces) or one or more JSON files which may each contain tens of thousands of values (e.g., objects, arrays, strings, real values, Boolean values or Null values).
- values e.g., objects, arrays, strings, real values, Boolean values or Null values
- FIG. 1 describes a method to prepare tuples 130 for use in training the denoising ML model 214.
- the patient’s pre-treatment face photo 100 (or a 3D representation of the patient’s face - with optional color data associated with the mesh elements) may be provided as input data.
- Facial landmarks e.g., corresponding to the mouth or lower face
- one or more masks 104 may be generated (102) from the landmarks.
- the one or masks 104 and the face photo 100 may be augmented (106).
- Augmentation (106) may include one or more of the following operations on either 2D or 3D data: flips, warps, rotations, introduction of Gaussian noise, changing colors, among other operations.
- M augmentations may be generated (116).
- the augmented masks may be applied (108) to corresponding augmented versions of the patient’s face photo 100.
- the resulting M masked and (augmented) photos may be provided to the collection of M output tuples 130.
- the original (nonmasked) face photo 100 may undergo augmentation (110) and be provided to the collection of M output tuples 130.
- the 3D representation of the patient’s post-treatment target dentition 112 (e.g., a final setup, etc.) and (optional) oral care arguments 114 may be augmented (116), which may generate M clinically plausible augmentations of the patient’ s dentition.
- the present disclosure may augment the dentition of the patient in ways that are biologically plausible, and so fall within the distribution of the training dataset of cohort patient case data.
- modifications to the 3D tooth meshes can be performed using an encoder-decoder structure and a latent representation modification module (LRMM).
- LRMM latent representation modification module
- the encoder-decoder structure working in conjunction with an LRMM can make the teeth wider/narrower, longer/shorter, change color, square comers/rounded comers, or vary some other restoration design metrics or orthodontic metrics.
- the relative poses of the teeth e.g., poses of the teeth relative to each other
- the positions or orientations of the teeth may be jittered (e.g., undergo small random changes).
- the patient’s 3D dentition may be provided to a 3D representation generation module 126 (e.g., a U-Net, or others described herein), which may generate one or more latent representations which include hierarchical neural network features of the dentition.
- the resulting latent representation may be provided to the collection of M output tuples 130.
- Method 400 may generate one or more data structures which describe the color and/or surface texture of one or more teeth of the patient's dentition 404 (e.g., one or more color palettes 416).
- the method 400 may generate (406) an initial color palette 408 (e.g., by downsampling a 2D image of the patient's dentition 404).
- the initial color palette 408 may, in some implementations, undergo modification (412).
- the color palette pixels of the initial color palette 408 which correspond to the one or more segmentation masks 414 may undergo modification (e.g., increase or decrease) in whiteness, brightness, hue, saturation, to name a few attributes.
- Color spaces can include RGB, HSV, LAB, or the like.
- the initial color palette 408 may be generated (406) by downsampling or averaging the colors of mesh elements within connected components (or within a threshold distance of each other in the mesh), among other methods.
- the segmented mesh element labels 414 may be used to designate one or more teeth or other aspects of the patient's dentition 404 for color palette generation.
- the colors of mesh elements which are designed for color palette generation may be modified (412), as described herein.
- the method 400 may output one or more modified color palettes 416, which may provided to techniques of this disclosure to influence the colors and/or textures of predicted smiles (e.g., predicted smile 530).
- oral care arguments 402 may influence the generation of a color palette.
- oral care arguments 402 may designate one or more teeth for processing or analysis, according to techniques of this disclosure.
- Method 500 may train a smile prediction ML to generate one or more predicted smiles 530.
- a predicted smile 530 may show the patient's target dentition 542 (e.g., a lateral incisor with a clinically desired shape, among other examples) integrated into the patient's original dentition data 508.
- the patient's original dentition data 508 may include 3D tooth meshes, 3D meshes of gums, 2D photographs of teeth (or gums), 2D renderings of 3D mesh data, or the like.
- the smile prediction ML model may use one or more diffusion conditioning adapters 534, as shown in diffusion conditioning module 526, to pre-process the patient’s dentition data before the dentition data are provided to denoising diffusion ML model (e.g., to perform dentition registration as described herein).
- Dentition registration is a digital processing operation that may align two or more dentitions, so that differences between the dentitions can be quantified and used in a cost function or loss calculation.
- registration may involve comparing aspects of a representation (e.g., 2D images, 3D models, or other representation) of a first dentition with a representation of a second dentition to determine whether the first and second dentitions are related in some way.
- Method 500 may include the training of a denoising diffusion ML model 522. Further detail on the training of the denoising diffusion ML model 522 is described by method 800.
- Method 900 describes the operational use of a fully trained denoising diffusion ML model 522 (e.g., when the denoising diffusion ML model 522 is deployed as module 608 in method 600, or as module 708 in method 700, or the like).
- the method 500 may train a smile prediction ML model to generate one or more smile predictions 530 which have teeth whose poses or appearances that are influenced by one or more oral care arguments, one or more color palettes, or one or more target dentitions 542.
- Target dentition 542 may include 3D meshes (e.g., containing colors and/or textures, or geometric shape information), or 2D images (e.g., containing 2D image masks or detected edges with defined tooth shapes, among others) that illustrate intended modifications to make to the patients dentition.
- the method 500 may generate one or more predictions of the patient's full face, one or more predictions of the patient's mouth, or may include other parts of the patient's anatomy.
- an 'area of interest' 512 is specified or generated (504) which circumscribes a portion of dental anatomy which is to be in-painted into a segmentation region 540 (e.g., which is described by mesh element labels in 3D or image masks in 2D).
- One goal of training a smile prediction ML model using training method 500 is to train the smile prediction ML model how to in-paint arbitrary objects into masked regions (e.g., to in-paint aspects of target dentition 542 into a 2D or 3D representation of the patient's mouth or the patient's face), such as the masked region shown in masked dentition 506.
- one or more teeth of the patient's original dentition 508 may undergo segmentation (510), which may generate 2D image masks (or 3D mesh element labels) 540, which may then be applied to the patient's dentition 508, resulting in masked dentition 506.
- the one or more 2D image masks (or 3D mesh element labels) 540 may undergo augmentation (528), for the purpose of training denoising diffusion ML model 522 to respond to differently shaped image masks (e.g., to generate a custom smile containing realistic-looking teeth whose shapes and/or poses have been influenced by the provided image masks).
- the one or more target image masks may be provided to the fully trained denoising diffusion ML model 708 which can be used as a reference representation for the model 708 to generate a realistic predicted smile in which the one or more treated teeth have assumed the shapes defined by the respective one or more target image masks.
- the resulting teeth have color, transparency and/or specular reflections which look realistic (e.g., look consistent with the other teeth of the patient’s dentition).
- This method of influencing the shapes (or poses) of teeth in the predicted smile is generally used to generate teeth with clinically and/or aesthetically desired shapes (or poses) which are biologically plausible.
- target tooth masks which are not biologically typical may be provided to denoising diffusion ML model 708.
- target image masks may be generated for the left and right upper cuspids which have exaggerated lengths and/or exaggerated pointiness, so as to influence the denoising diffusion ML model 708 to generate a predicted smile for a human subject where the human subject is given the teeth of a vampire or other fanciful creature (e.g., for use in a work of art, or in a work of entertainment such as a film).
- method 500 may train a smile prediction ML model to in-paint an image of a target tooth anatomy 542 into a photograph of a patient's face in a manner that is aesthetically pleasing and/or clinically plausible (e.g., the colors, reflections, and/or shapes of the teeth are realistic, etc.).
- target tooth anatomy 542 include post-restoration teeth or post-treatment final setups for orthodontics.
- the 'area of interest' 512 may be used to define the portion of a post-treatment target dentition 542 that is to be realistically integrated with a photograph of the patient's face, where the patient's face photo initially shows the pre-treatment dentition.
- a segmentation image mask 540 is generated (510) which designates a portion of the patient's pre-treatment photo 508 into which the target dentition image 542 is to be in-painted or otherwise realistically integrated by the fully trained smile prediction ML model.
- the 'area of interest' 512 may be used to define a portion of the patient's original dentition 508 which is to be provided to the denoising diffusion ML model 522, as a substitute for the patient's post-treatment dentition.
- the masked dentition 506 may be provided to the denoising diffusion ML model 522 to train the denoising diffusion ML model 522 to learn the distribution of patient dentitions 508.
- diffusion conditioning adapter 534 may process the patient’s dentition data (e.g., 3D or 2D data), to clarify and/or strengthen the signal in those data, and/or prepare those data to be provided to the denoising diffusion ML model 522. For example, when the patient’s dentition includes 2D image data, diffusion conditioning adapter 534 may generate a 2D image containing at an outline of one or more target tooth shapes (e.g., generated by applying an edge detector, such as the Canny edge detector or Sobel edge detector, and then optionally modifying the contours of those detected edges).
- an edge detector such as the Canny edge detector or Sobel edge detector
- one or more image masks which describe the target poses or target shapes of one or more teeth may be provided to denoising diffusion ML model 522, and may control the resulting shapes or poses of respective one or more teeth in the predicted smile 530.
- diffusion conditioning module 534 may perform method 1100 to generate one or more dentition masks 1114.
- the one or more dentition masks 1114 may contain one or more registered projected 2D dentition mask images (i.e., one or more images that have undergone a registration processes defined herein), which may contain connected components. Each connected component may describe the target shapes and/or target poses for one or more teeth.
- Denoising diffusion ML model 522 may denoise the connected components, to transform those flat shapes into realistic-looking teeth with colors, textures, specular reflections, or shadows that are consistent with other teeth of the patient’s dentition, and/or are consistent with the distribution of teeth in the training dataset.
- the output of diffusion conditioning adapter 534 may be provided to encoder 536 to encode the output (e.g., 2D image masks, 3D representations, etc.) into one or more latent representations, and then the one or more latent representations may be provided to the denoising diffusion ML model 522 (e.g., or another flow-based ML model).
- diffusion conditioning adapter 534 may compute edges, perform registration or perform image sharpening (or perform other operations described herein), the results of which may be provided to encoder 536.
- the denoising diffusion ML model 522 may then generate a predicted smile 530, where one or more teeth have the shapes of the one or more tooth image masks (or tooth outlines). Stated another way, the shapes of one or more teeth in the predicted smile 530 may be influenced by the contours of the connected components in an image mask.
- the image masks (or edge- detected image outlines) may specify the intended shapes of one or more teeth.
- the method 500 may generate a predicted smile in which one or more treated teeth have the color, translucency, specular reflections, and/or texture of the other teeth of the patient (e.g., as specified by color palette 532).
- a color palette 532 may be generated by module 538 (e.g., according to method 400), and then be provided to diffusion conditioning module 526.
- the color palette 532 may specify color modifications, or modification to the whiteness of one or more teeth in the original patient dentition data 508.
- target patient dentition data 542 may include one or more 3D meshes (or other 3D representation described herein), each of which has the intended target shape (or color, texture, or pose, etc.) of a tooth.
- Such target patient dentition data 542 may be provided to conditioning adapter 534, which may process the patient's dentition to make the data cleaner or otherwise improve the efficient use of the dentition data by the denoising diffusion ML model 522 (e.g., by performing segmentation, mesh cleanup, downsampling, upsampling, registration between current and target dentitions, or the like).
- the target patient dentition data 542 may be provided directly to encoder 536, which may generate one or more latent representations (e.g., embedding vectors, latent vectors, latent capsules, or the like).
- the 3D mesh of the patient's dentition data 508 or the target dentition data 542 e.g., one or more tooth meshes
- the 3D mesh of the tooth may be provided to conditioning adapter 534, and then be encoded into one or more latent representations by encoder 536.
- the one or more latent representations may then be provided to concatenation module 524, or be directly provided to denoising diffusion ML model 522.
- Denoising diffusion ML model 522 may then generate one or more smile predictions 530 (e.g. 2D or 3D predictions).
- the prediction may, for example, comprise a 2D image of the patient's smile where the specified one or more teeth (e.g. one or more teeth specified by oral care arguments 502) has assumed the poses or appearances specified by the corresponding teeth in target dentition 542.
- any aspects of the patients teeth e.g., the 3D geometric information of a 3D tooth mesh, the tooth's surface color and/or texture
- aspects of the patients teeth may be processed by separate conditioning adapters, and/or subsequently encoded by one or more encoders 536.
- a conditioning adapter 534, encoder 536, and/or a denoising diffusion ML model 522 may be trained end-to-end, so that the models are trained concurrently, sometimes using the same one or more loss functions.
- Other unrolled or interconnected ML models of this disclosure may also be trained in an end-to-end manner.
- Various inputs e.g., tooth transforms, 2D images, 3D meshes of crowns, 3D meshes of roots, or color palettes, or other information
- a diffusion conditioning adapter 534 e.g., T2I adapters, among others
- Diffusion conditioning module 526 may contain one or more pairs of diffusion conditioning adapter 534 and encoder 536.
- edges may be generated using, for example, a Canny edge detector, a Sobel edge detector, or other another type of edge detector.
- the edges may reveal the contours of teeth, gums, lips, or other aspects of the patient's dentition, mouth or facial appearance.
- the edges (or 2D image masks) that describe the contours of one or more teeth may be modified to define new target shapes.
- the images which are generated by edge detection may be provided to aggregation module 524, which may aggregate or concatenate one or more latent representations. The concatenated latent representations may then be provided to denoising diffusion ML model 522.
- the denoising diffusion ML model 522 can, in some implementations, be replaced with other flow-based ML models, such as continuous normalizing flows. In some implementations, the flow-based models of this disclosure may be trained, at least in part, using flow matching.
- denoising diffusion ML model 522 may include an encoder-decoder structure (e.g., a U-Net, Vision Transformer (ViT), or others described herein).
- image processing filters may be applied by diffusion conditioning adapters 534 to images of the patient's dentition data 508 to clarify aspects of the images and provide enhanced information about the shape and/or structure of the patient's anatomy to the denoising diffusion ML model 522.
- image sharpening is Unsharp Masking (USM).
- other filters may be applied by diffusion conditioning adapter 534, such as Bas Relief or Chalk & Charcoal.
- Denoising diffusion ML model 522 may output a predicted smile 530, which may include modified patient dentition data integrated into a 2D or 3D representation of the patient’s face.
- an area of interest 512 of the original patient dentition data 508 may be generated or specified (504).
- the area of interest 512 of the dentition data may undergo (optional) augmentation (514), which results in augmented dentition data 516.
- the area of interest 512 or the augmented data 516 may be provided to encoder 518 (e.g., a CLIP encoder, or the like), which may generate one or more latent representations, which may, in some implementations, be provided to neural network 520 (e.g., a multilayer perceptron, among others), or directly to denoising diffusion ML model 522.
- the neural network 520 may, in some implementations, change the shapes of the one or more latent representations which are generated by encoder 518, so that the one or more latent representations are properly formatted to be provided to the stable diffusion process in denoising diffusion ML module 522.
- the output of the neural network 520 may be provided to denoising diffusion ML model 522, for the purpose of inducing denoising diffusion ML model 522 to learn how to in-paint arbitrarily shaped teeth into a masked region (e.g., masked using either a 2D image mask or using mesh element labels in a 3D mesh).
- a masked region e.g., masked using either a 2D image mask or using mesh element labels in a 3D mesh.
- An example of a masked region is shown in masked dentition 506.
- the training method 500 shows a pre-restoration tooth (e.g., a peg lateral) within the area of interest 512.
- the area of interest 512 would be configured to circumscribe (or otherwise designate) one or more teeth which describe the target post-restoration appearances of the patient's teeth.
- the fully trained model has been trained (via training method 500) to in-paint one or more teeth of target dentition 542 (e.g., with clinically or aesthetically idealized aspects, such as target shapes, colors, textures, or whitening) into a 2D image or 3D mesh representation of the patient's face (or mouth).
- the patient dentition data 508 may be segmented (510).
- the outputs 540 of segmentation may include one or more image masks (when patient dentition data 508 includes 2D images), or one or more mesh element labels (when the patient dentition data 508 includes 3D meshes).
- the image masks (or mesh element labels) may be augmented (528), and then be applied to the original patient dentition data 508, resulting in masked dentition data 506, which may then be provided to the denoising diffusion ML model 522.
- the purpose of the masking is to define the one or more regions of the patient's original dentition data 508 into which the post-treatment dentition 542 is to be integrated.
- original dentition data 508 can include 2D images of the patient's face and/or teeth. In some instances, original dentition data 508 can include 3D representations (e.g., 3D meshes, etc.) of the patient's teeth and/or face. According to various implementations, tooth image masks (or mesh element labels) 540, augmented tooth image masks (or augmented mesh element labels) 528, masked dentition 506, and/or the original patient dentition data 508 may be provided to the input of the denoising diffusion ML model 522. Method 800 of FIG. 8 shows additional detail for the training of the denoising diffusion ML model 522. Method 900 describes the functioning of a fully trained denoising diffusion ML model 608 or 708 in deployment.
- Denoising diffusion ML model 522 may be trained, at least, in part through the calculation (826) of one or more loss values. For example, a loss may be computed (826) that quantifies the difference between a predicted noise tensor 824 and a ground truth noise tensor 816.
- the ground truth (or actual) noise tensor 816 may, in some implementations, be computed (814) using a pseudorandom number generator to generate Gaussian noise values (or noise values which are drawn from other distributions).
- the ground truth noise tensor is added to latent representation 812, and the resulting latent representation is provided to encoder-decoder structure 822.
- Encoder-decoder structure 822 may then determine which part of the input is noise, and/or which part of the input is a meaningful signal (e.g., which part of the input pertains to the patient’s dentition). Encoder-decoder structure 822 may then output a predicted noise tensor 824.
- the example in method 800 shows how stable diffusion has been customized to the task of predicting smiles. In stable diffusion the inputs undergo latent encoding (e.g., using encoder 518 or 536), and then are provided to denoising diffusion ML model 522. In some implementations, a Markov chain of increasingly noisy examples of the input data may be generated and used to train the denoising diffusion ML model 522 in how to perform the denoising operation.
- Method 800 provides additional detail on the training on denoising diffusion ML model 522.
- Oral care arguments 502 may, in some implementations, be provided to the method 500 to influence or customize the outputs 530.
- Oral care arguments 502 may include, for example, specifications of which teeth to segment (510), specifications of which teeth to treat, or specifications of which teeth are static or pontic (e.g. and may therefore be designated to not move during orthodontic treatment), among others.
- the oral care arguments 502 may specify one or more teeth which are to undergo color modification, texture modification, and/or specify the nature of the whitening (e.g., to increase or to decrease the lightness of one or more teeth).
- oral care arguments 502 may specify which teeth are to be whitened (or otherwise processed), the magnitude of whitening which is to be applied, or which other types of augmentations are to be performed.
- oral care argument 502 may specify changes to the shapes of one or more teeth (e.g., such as increasing or decreasing tooth crown length).
- oral care argument 502 may include one or more 2D image masks which may be used to define new shapes for one or more teeth.
- the oral care arguments 502 may specify the magnitude of change in length of one or more teeth (e.g., to shorten the lateral incisors, or to lengthen the cuspids, to name a couple examples).
- Oral care arguments 502 may specify one or more oral care metrics (e.g., Arch Symmetry, Proportions of Adjacent Teeth, or others described herein), which may influence the denoising diffusion ML model 522 (or 608 or 708) to generate one or more smile predictions in which the patient's teeth show shapes and/or poses which are influenced by the one or more oral care metrics.
- one or more image masks may be defined (e.g., via segmentation).
- the one or more image masks may correspond to one or more teeth which are to be modified (e.g., lengthened, or shortened, or otherwise modified in shape or pose).
- An image mask may include pixels which has the value of zero (e.g., which are to be ignored), and/or pixels which have non-zero values (e.g., which are to be processed).
- the one or more masks may themselves be augmented (528) (e.g., by lengthening or shortening each of the one or more masks - according to the desired change in the patient's dentition), and then the one or more augmented masks 506 be provided to denoising diffusion ML model 522 to influence the model 522 to generate a predicted smile 530 in which the one or more teeth corresponding to the one or more masks are lengthened or shortened (or otherwise altered in shape) according to the one or more augmented masks 506.
- the resulting altered teeth look realistic, including color, shading and/or reflections which are consistent with other teeth shown in the patient's dentition 624 or 724.
- oral care arguments 502 may specify a change in length or otherwise a change in shape (e.g., as defined by an image mask, restoration design parameters, or restoration design metrics, etc.) of one or more teeth in patient dentition data 508.
- the oral care arguments 502 may induce denoising diffusion ML model 522 to generate a 2D predicted smile which contains the specified tooth modifications.
- the predicted smile 530 may comprise 3D data, such as when the original patient dentition data 508, and/or target patient dentition data 542 comprise 3D mesh data.
- oral care arguments 502 may specify a change in pose (e.g., as defined by orthodontic procedure parameters, or orthodontic metrics, etc.) of one or more teeth in patient dentition data 508, which may induce denoising diffusion ML model 522 to generate a predicted smile 530 which contains the specified tooth modifications.
- original patient dentition data 508 may contain 2D images or 3D meshes of the patient's face or other parts of anatomy.
- oral care arguments 502 may undergo latent encoding (556), and the resulting one or more latent representations may be provided to concatenation module 524, which may provide its output to the denoising diffusion ML model 522.
- oral care arguments 502 may: 1) specify which one or more teeth are to be treated; 2) specify one or more color palettes to be used to alter the color of one or more teeth; 3) specify an increase or decrease in whiteness to apply to one or more teeth; 4) specify an amount of staining that should be removed from one or more teeth; 5) specify a diastema which is to be generated (or modified) between two adjacent teeth; 6) specify a change to the cant or pose of a tooth (e.g., specified in units or degrees or radians of rotation about one or more axes of the tooth); and 7) specify the distance one or more teeth are to be intruded or extruded (e.g., a distance in mm by which the upper canines are to be extruded for the final setup, etc.), among others described herein.
- method 500 can train a flow-based smile prediction ML model which can predict the outcome of dental restoration treatment (e.g., as seen in FIG. 5A-1 and 5A-2), orthodontic treatment (e.g., visualizations for which are shown in FIG. 5B), or other types of oral care treatments as well (e.g., surgical reconstruction, etc.).
- Orthodontic treatment may include bracket-based treatment (e.g., through the use of indirect bonding, such as the Solventum Digital Bonding Tray), or aligner-based treatment (e.g., through the use of CLARITY Aligners), among other examples.
- Dental restoration treatment may include the use of dental restoration appliances (e.g., FILTER Matrix), crowns, bridges, inlays or onlays, fillings, dental implants, dentures, among others.
- method 500 can be used to train flow-based smile prediction models for orthodontics, dental restoration, or other treatments
- the illustration of the patient's teeth 504 in FIG. 5 A- 1 shows a non-limiting example where an area of interest is formed around a "peg lateral" tooth, which is designated as the target of dental restorative treatment.
- FIG. 5B shows non-limiting examples of data from method 500 when the flowbased smile prediction ML model is trained to predict the outcome of orthodontic treatment.
- Example 544 shows the original patient dentition data 508 (either 2D image or 3D mesh of the patient's mouth or full face) where the patient's teeth are maloccluded and in need of orthodontic treatment.
- Example 546 shows an image mask 540 that was generated by segmentation of the patient's face (e.g., segmentation to identify the region or area between the patient's lips).
- Example 548 shows the patient dentition data with mask applied 506.
- Example 550 shows the patient's dentition data area of interest 512 (e.g., the portion of the original patient dentition data 508 which lies underneath the image mask 540).
- Example 552 shows the registered output (as described herein) of an example conditioning module 534 which has processed the target dentition 542 to clarify or enhance aspects of the target dentition 542 (e.g., by performing image sharpening, edge detection, edge modification, segmentation, mask generation, mask modification, image noise removal, or the like).
- Registration in this context means to to align one or more dentitions with one or more other dentitions.
- Alignment may be determined in a number of ways. For example, alingment may be determined by comparing certain aspects (e.g., pixels) of a first image with respective aspects of a second image to determine whether those the aspects of the first and second images are substantially similar.
- the target dentition 542 contains 3D mesh data
- registration may be performed/alignment may be determined by diffusion conditioning adapter 534 so that the 3D target dentition looks natural (e.g., contains few visual artifacts) when projected into a 2D plane, for integration into the predicted smile 530.
- Example 554 show the completion of smile prediction for orthodontic treatment, where the patient's predicted smile 530 shows the target dentition 542 realistically integrated into the region between the patient's lips in the patient's original dentition data 508.
- the method 600 may use a fully trained smile prediction ML model that operates using conditioned diffusion.
- conditioned diffusion is conditioned on 3D mesh data of the patient's dentition or 2D image data of the patient's dentition, or other examples of data described herein.
- the method 600 may use an ML model that was trained using methods 500 and/or 800 to generate smile predictions for orthodontic treatment in real-time, such as to show the patient waiting in the treatment chair what their teeth, mouth, and/or other aspects of the patient’s face will look like at the completion of orthodontic treatment (e.g., the appearance of the patient with a final setup), or to show the patient their appearance mid-treatment (e.g., the appearance of the patient during an intermediate stage of orthodontic treatment).
- the method 600 may generate a full-face prediction, or a full mouth cavity prediction that realistically shows the patient's face, lips, and/or dentition (e.g., mouth, gums and/or teeth), where the predicted target dentition 612 is realistically integrated into the cavity between the parted lips.
- a full-face prediction or a full mouth cavity prediction that realistically shows the patient's face, lips, and/or dentition (e.g., mouth, gums and/or teeth), where the predicted target dentition 612 is realistically integrated into the cavity between the parted lips.
- dentition e.g., mouth, gums and/or teeth
- the predicted smile 610 shows the target dentition 612 registered and integrated within the lips and surrounding facial structure in a manner that looks realistic (e.g., is free of shape, color, or texture artifacts that might tip-off the viewer that the image was artificially generated).
- the patient's target dentition data 612 and/or the patient’s dentition (or facial) data 624 may include 3D data or 2D data. Examples of 3D data include 3D tooth meshes, or 3D gums meshes. Examples of 2D data include 2D photos of teeth or gums, or 2D image renderings of 3D data.
- the patient’s target dentition data 612 may include mid-treatment or post-treatment setups for orthodontic treatment, or post-treatment target tooth shapes for dental restorative treatment, or the like.
- the patient's target dentition data 612 and/or the patient’s dentition data 624 (which may optionally include facial data) may be provided to diffusion conditioning module 614.
- Diffusion conditioning module 614 may modify the data in a way that enables the data to be used by denoising diffusion ML model 522 to generate a predicted smile 610.
- diffusion conditioning module 614 may process the patient’s 3D tooth meshes to generate one or more 2D image masks (using techniques of this disclosure) which are aligned with the teeth in the patient’s 2D facial data 624. Diffusion conditioning module 614 may output one or more latent representations, which may then be provided to aggregation module 622. Aggregation module 622 may may provide an aggregated or concatenated latent representation of the patient’s existing and/or target dentition data to denoising diffusion ML model 608.
- the patient's facial data 624 (e.g., 2D photograph with the region between the patient's lips masked- off, or 3D mesh of patient's face with one or more mesh element labels configured to designate the region between the patient's lips, or the like) may be encoded (606) into one or more latent representations, which may then be provided to denoising diffusion ML model 608 (or a continuous normalizing flow -based model, or other flow-based model).
- the segmentation may generate one or more mesh element labels 728 which designate one or more mesh elements as belonging to particular teeth (e.g., a tooth which is to undergo dental restoration).
- the output 728 of segmentation e.g., either image masks or mesh element labels
- the patient's dentition data with mask (or mesh element labels) applied 704 may undergo latent encoding (706), and then be provided to denoising diffusion ML model 708 (e.g., a fully trained denoising diffusion ML model).
- one or more color palettes 732 may be provided to diffusion conditioning module 714, and may influence the denoising diffusion ML model 708 to generate one or more predicted smiles with patient dentition colors which are specified by the one or more color palettes 732.
- diffusion conditioning module 714 may include one or more pairs of diffusion conditioning adapter 720 and encoder 718.
- Diffusion conditioning adapter 720 may process input data (e.g., patient's original facial data 724, target dentition data 712, oral care arguments 702, a color palette 732, etc.) to enhance or clarify aspects of the signals within that input data (e.g., using mesh element labelling, image segmentation, edge detection, image sharpening, or other methods described herein), and then provide its outputs to encoder 718, which may generate one or more latent representations (or latent embeddings), which may be provided to aggregation (or concatenation) module 722.
- input data e.g., patient's original facial data 724, target dentition data 712, oral care arguments 702, a color palette 732, etc.
- encoder 718 may generate one or more latent representations (or latent embeddings), which may be provided to aggregation (or concatenation) module 722.
- the image mask 728 may be modified (e.g., by changing the shape of the mask), in order to alter the shapes of one or more teeth in predicted smile (or predicted dentition) 710, which is generated by the denoising diffusion ML model 708.
- the image mask 728 may initially describe the shape or contours of a mal-formed, damaged or chipped tooth
- the image mask 728 may be modified to describe the shape of a clinically or aesthetically improved tooth (e.g., resulting in an image mask in which the target teeth have uniform shapes, free of chips, cracks, unwanted diastemas, evidence of decay, etc.).
- the one or more latent representations of the patient’s target dentition 802 may be provided directly to encoder-decoder structure 822 (e.g., a UNet).
- the one or more latent representations 812 or 802 may have a reduced dimensionality relative to the original data, and/or may be formatted so as to be suitable to be provided to encoder-decoder structure 822 (e.g., a UNet, a vision transform ViT, an autoencoder, a pyramid encoder-decoder structure, etc.).
- noise may be computed (814).
- Gaussian noise may be computed (814) and be formatted into one or more actual noise tensors 816.
- the one or more actual noise tensors 816 may be combined, added or otherwise integrated (818) with the one or more latent representations 812.
- a concatenated latent representation of all inputs 812 is summed (818) with an actual noise tensor 816, and then provided to encoder-decoder structure 822 (e.g., a UNet that contains skip connections, and that extracts hierarchy of global neural network features, to local neural network features from the UNet' s inputs), which generates a predicted data structure describing noise (e.g., a noise tensor 824).
- encoder-decoder structure 822 e.g., a UNet that contains skip connections, and that extracts hierarchy of global neural network features, to local neural network features from the UNet' s inputs
- Loss may be computed (826) between the predicted noise tensor 824 and the actual noise tensor 816. Among the other losses described herein, the loss may compute the mean squared error difference between the actual noise tensor 816 and the predicted noise tensor 824. The computed loss may be used to update (828) the weights of the encoder-decoder structure 822 (e.g., a UNet), until training is done (820). In some implementations, training may proceed (810) until loss drops below a threshold value, or until a target count of epochs have been completed.
- the encoder-decoder structure 822 e.g., a UNet
- the fully trained denoising diffusion ML model 522 is shown in method 600 as module 608, and method 700 as module 708.
- Method 900 describes the operational use of that fully trained model, for example, in a time-constrained setting (e.g., while the patient waits in the treatment chair).
- method 900 shows the operational use of the fully trained denoising diffusion ML models 608 or 708.
- various input data may be provided to denoising diffusion models 608 or 708, the patient's dentition with mask applied 906, oral care arguments 926, or mask data 904 (e.g., one or more image masks, or one or more mesh element labels), among others described herein.
- any of these inputs may be provided to one or more encoders 908 to encode those inputs into one or more latent representations 912.
- the denoising ML model may iterate (910) for M iterations, until a target accuracy is achieved, or until the outputs are otherwise deemed to be clinically or aesthetically suitable.
- a concatenated latent representation of all inputs 912, a latent representation of the patient’s target dentition 928, and/or a noising image (or noisy 3D representation) 902 may be provided to encoder-decoder structure 914 (e.g., a UNet), which may generate predicted noise tensor 916.
- encoder-decoder structure 914 e.g., a UNet
- Predicted noise tensor 916 may be removed (920) (e.g., by subtraction between matrices or tensors, etc.) from the latent representation 912 (e.g., a latent vector or latent embedding, etc.).
- the denoised latent representation may be provided to decoder 922, which may reconstruct the denoised latent representation into one or more 2D images or 3D representations which are suitable for clinical treatment of the patient (e.g., for smile prediction).
- the resulting predicted smile 924 may be outputted for aesthetic prediction or clinical treatment planning.
- mesh element feature vectors may be computed (e.g., a mesh element feature vector may be computed for each 3D point, 3D vertex, 3D edge, 3D voxel, or 3D face of a 3D representation).
- the mesh elements and corresponding mesh element feature vectors of the 3D input data may be provided to the neural networks of this disclosure (e.g., encoder 536, or UNet 822, among others), to improve the ability of those neural networks to encode the distribution of the 3D input data (e.g., enable the neural networks to better encode the shapes and/or structures of those 3D representations).
- the structure of a 3D representation may include aspects of the 3D representation such as: 1) which mesh elements are adjacent to each other; 2) which mesh elements are connected to each other; 3) which mesh elements are within a threshold Euclidean distance of each other; and/or 4) which mesh elements are within a particular count of connections of each other, or the like.
- conditioning adapter 534 may register the patient's target dentition data 1002 (e.g., 2D or 3D data) with the patient's original dentition data 1202 (e.g., 2D or 3D data) as follows.
- Diffusion conditioning adapter 534 may use method 1100 to segment (and/or project) (1008) target dentition 1002 into dentition mask 1114 (e.g., registered projected 2D dentition mask image).
- Method 1100 may use optimized camera parameters 1104 to perform (1008) the projection operation (e.g., using raycasting).
- Diffusion conditioning adapter 534 may use method 1600 to apply (1602) a patient smile mask 1308 to a dentition mask 1114, resulting in dentition mask 1604 (e.g., registered projected 2D dentition mask with smile mask applied image). Diffusion conditioning adapter 534 may output dentition mask 1604 (e.g., registered projected 2D dentition mask with smile mask applied image), which may be provided to denoising diffusion ML model 522, 607, or 708.
- Optimized camera parameters 1104 may be generated using optimization techniques described herein.
- the patient's target dentition 1002 contains one or more 3D representations of teeth (or gums), and the patient's original dentition contains one or more 2D dentition images 1202
- techniques of this disclosure can register a 2D projection 2204 of the patient’s 3D target dentition 1002 with the patient's 2D dentition image 1202.
- the patient's 2D dentition photo may include one or more images of the patient's teeth (e.g., when the patient's mouth is open for a smile, etc.)
- a target dentition 1002 may include 3D representations of the teeth and/or gums.
- the target dentition 1002 may contain one or more teeth which are designated for dental restorative treatment, and/or one or more teeth which are designated for orthodontic treatment.
- Raycasting techniques known to one skilled in the art may use camera parameters to project a 3D representation onto a 2D surface (e.g., project a 3D mesh into a 3D image).
- Camera parameters e.g., azimuth, elevation, roll, focus, y-axis coordinate, x-axis coordinate, or z-axis coordinate, among others
- the positive y-axis may point outward from the incisors, in the direction of the patient's forward gaze. Angles may be expressed in degrees, radians, or the like.
- camera parameters may be optimized using a stochastic, bounded global optimizer (e.g., differential evolution). Either gradient-based or non-gradient based methods may be used.
- a population that includes a plurality of members (otherwise known as individuals), each of member includes at least a set of camera parameters.
- Each individual in the population may undergo variation operators, which may change the population member in small or large ways (e.g., by adding small amounts of noise to one or more of the intrinsic or extrinsic parameter values which are included in a population member).
- a population member may include a set of camera parameters which is capable of projecting the patient's 3D dentition into 2D using raycasting.
- Variation operators may operate on the population member (e.g., the data structure that is being optimized).
- the population member may be encoded into a latent representation using an encoder neural network, and then the variation operators may vary that latent representation through the course of optimization.
- the variation operators may vary the camera parameters (or latent representations of the camera parameters) of the individual, and then the fitness (or cost) associated with those modified camera parameters may be computed.
- This fitness evaluation may use the camera parameters 2202 to project (1008) the 3D meshes of the patient's target dentition 1002 into a segmented dentition mask 2204 (e.g., projected 2D dentition mask image).
- segmentation is performed through raycasting. For example, a pixel in a particular connected component of segmented dentition mask 2204 (e.g., projected 2D dentition mask image) belongs to the tooth through which the ray intersects(?) during the raycasting calculation.
- the new members may comprise mutated copies of the remaining members, or may be generated by combining aspects of two or more remaining members (e.g., crossover).
- the genetic algorithm may then evaluate the fitness of each member and iterate until a stopping criterion is met (e.g., a threshold count of iterations has transpired, an average population fitness has been achieved, or the population has produced at least one population member with a minimal fitness, etc.).
- the optimization algorithm may generate a set of camera parameters which does a good job of projecting the patient's 3D target dentition 1002 into a projected 2D dentition mask image 2204 which has a good alignment with the corresponding segmented 2D dentition mask image 1208.
- dentition mask 2208 may be overlaid with dentition mask 2206, and a fitness (or cost) function may be computed.
- the fitness function determines how well the two dentition masks overlap.
- target dentition data 1002 contains 3D tooth meshes
- raycasting may project (1008) the 3D data of those tooth meshes into a 2D image 2204, forming one or more connected components.
- Each connected component (or set of adjacent connected components belonging to the same tooth) may have an assigned a color or other identifying attributes which associate the connected component with a particular tooth.
- the dentition masks 2204 and 2206 each contain sets of these colored (or labelled) connected components.
- a segmented dentition 1208 e.g., a segmented 2D dentition masks image
- patient dentition 1202 e.g., a 2D patient dentition image
- a cost function may generate a zero value when the 'projected 2D dentition mask' and 'segmented 2D dentition mask' perfectly overlap.
- Examples of such a cost function include MSE, Huber Loss, Mean Absolute Error (MAE), Quantile Loss, Log-Cosh Loss, and Mean Absolute Percentage Error (MAPE), among others.
- a cost function may, in some implementations, compute the binary segmentation overlap error at each pixel location between two image masks. In some implementations, the overlap error at each pixel may be computed, at least in part, based on the distance between the pixel and the closest lip edge (e.g., the edge of smile mask 1308).
- the cost function may calculate the distance between each pixel in the dentition mask 2204 and the corresponding pixel(s) in dentition mask 1208. In some implementations, the cost function may calculate the distance between each pixel in the dentition mask 2206 and the corresponding pixel(s) in dentition mask 1208.
- Method 1000 shows how initial camera parameters 1004 may be used to project (1008) the 3D target dentition 1002 into an initial 2D representation 1014 (e.g., 2D image mask that contains connected components which show the outlines of the teeth).
- Each connected component may have a color or other identifier.
- a connected component may define the 2D profiles of one or more teeth.
- a subset of the connected components in image 1014 may be selected (1016).
- the mask which contains that subset of connected components is the initial projected 2D subset mask image 1018.
- the mask image 1018 may be registered (or aligned) with the segmented 2D dentition masks image 1208 by optimizing a cost function (or fitness function).
- the mask image 1208 may be generated by performing 2D tooth segmentation (1206) on the 2D patient dentition image 1202. Alternatively, if the patient dentition 1202 contains 3D data, then 3D mesh segmentation (1206) may be performed. Oral care arguments 1106 may be provided to the 2D or 3D segmentation methods. Examples of oral care arguments 1106 that can influence segmentation include thresholding values, among others.
- camera parameters e.g., intrinsic or extrinsic parameters
- optimization algorithms which can be used to perform such an optimization include: simulated annealing, genetic algorithms, differential evolution, particle swarm analysis, ant colony optimization, or other optimization algorithms (e.g., either gradient-based or non-gradient based algorithms).
- One or more cost functions or fitness functions may quantify the correctness of the alignment of two or more dentitions.
- Cost (or fitness) functions may be iteratively evaluated to guide the operation of an optimization method, until the cost (or fitness) crosses a pre-determined threshold (or until a pre-determined count of iterations have transpired).
- the cost function may be minimized using optimization algorithms described herein.
- Method 2200 describes a method of using a cost (or fitness) function to determine the effectiveness or utility of a set of one or more camera parameters (e.g., described herein as a population member 2202).
- the 2D image 2204 (or 3D meshes 1002) of the patient's target dentition may be aligned with a segmented 2D dentition masks image 1208 of the patient’s current dentition.
- the masks image 1208 may be generated by performing (1206) 2D image segmentation on a photo 1202 of the patient (such as when the patient’s teeth are showing in a smile).
- the masks image 1208 may contain connected components that show the outlines of the teeth.
- a cost function may be computed between two (or more) masks, to quantify the extent to which the masks differ.
- a cost function may be computed to quantify the difference between mask subset 2208 (e.g., a subset of tooth connected components from the original dentition data) and mask subset 2206 (e.g., a subset of tooth connected components from the target dentition data).
- mask subset 2208 e.g., a subset of tooth connected components from the original dentition data
- mask subset 2206 e.g., a subset of tooth connected components from the target dentition data.
- One of the advantages of computing a cost function between two subset images is that the operation can be performed between teeth whose identities are known with high confidence. After tooth segmentation is performed (e.g., segmentation of patient dentition 1202), tooth identities may be determined. The determination of tooth identities may lead to erroneous identifications (or errors) a certain percentage of the time.
- Methods of determining tooth identify may, in some implementations, output confidence values.
- the central incisors may be identified in segmented dentition mask 1208 with high confidence, but a second bicuspid may be identified with low confidence (e.g., due to partial occlusion by the lips). Therefore, there is value in using teeth whose identities are known with high confidence.
- cost may be computed between corresponding teeth between two or more image masks. For example, a cost may be computed that quantifies the alignment of the ULI connected components in dentition mask 2206 and image mask 2208. Costs may be computed for other teeth, as well. In some implementations, the cost calculation for a particular tooth may incorporate cost information from one or more other teeth in the dentition (e.g., adjacent teeth). This approach can mitigate the error introduced by erroneous tooth identifications that may result from tooth segmentation.
- Oral care arguments 602 may include oral care arguments described herein, or any of the following: 1) indication of which one or more teeth are to be used to register the target dentition 1002 (e.g., 2D or 3D data) with the patient's original dentition data 1202 (e.g., 2D or 3D data). A subset of the patient's teeth may be used for registration (e.g., 4 teeth, or another count of teeth), thereby saving compute resources while still enabling registration to take place.
- Accuracy may actually be improved, because registration may be performed on teeth which may be more easily segmented (e.g., upper central incisors, among other examples);2) information pertaining to segmentation, such as the threshold for the confidence of a tooth identity assignment that may result from tooth segmentation. Stated another way, the tooth segmentation process may assign a tooth identity to each segmented tooth in the segmentation output (e.g., 3D mesh or 2D image output).
- the segmentation output e.g., 3D mesh or 2D image output.
- Each tooth identity assignment may have an associated confidence; 3) mouth openness threshold, which may specify a minimum width (or height) for patient smile mask 1308, or a minimum number of teeth which must be visible in patient dentition 1202 or target dentition 1002, for the patient's mouth to be considered sufficiently open for registration to be performed; 4) initial camera parameters (e.g., intrinsic or extrinsic) which may be used to project target dentition data 1002 into 2D image mask 2204 (e.g., when target dentition 1002 contains one or more 3D representations). Initial camera parameters may, in some instances, include zero values for some terms. The camera parameters may be optimized using a genetic algorithm, or other techniques described herein; and 5) other information pertaining to the registration of target dentition data 1002 and original dentition data 1202.
- initial camera parameters e.g., intrinsic or extrinsic
- Initial camera parameters may, in some instances, include zero values for some terms.
- the camera parameters may be optimized using a genetic algorithm, or other techniques described herein
- An optimization module (e.g., containing a genetic algorithm, or others described herein) may be executed to generate new variations of the camera parameters (e.g., population members 2202).
- the new variations of camera parameters (e.g., population member 2202) may be evaluated by the process described above. As a result, each variation of camera parameters is assigned a cost or fitness value.
- the best variation of camera parameters may be identified (e.g., the variation of camera parameters which does the best job of projecting the 3D target dentition into a 2D image mask that aligns well with the patient’s current dentition).
- the population member with the highest fitness can be used as the optimal camera parameters 1104.
- a predicted setup or a predicted tooth restoration design (either of which may comprise 3D or 2D data) can be registered with a 2D photograph of the patient’s smile.
- the resulting image mask 1114 e.g., a registered projected 2D dentition mask image
- denoising diffusion ML model 522 can be provided to denoising diffusion ML model 522 as a guide (or instruction) regarding the shapes (or poses) of the teeth which are to appear in the predicted smile.
- data may also be optimized according to particular implementations.
- additional data may be included with the camera parameters within each population member 2202.
- data which describes the alignment e.g., translational and/or rotational parameters that adjust the relative poses of the upper and lower arches
- 3D target dentition 1002 or other 3D mesh dentitions
- the optimization of these data may improve the alignment between the upper and lower arches, and therefore improve the accuracy of the registration techniques, smile prediction techniques, or other digital orthodontic operations described herein.
- aspects of the present disclosure can provide a technical solution to the technical problem of predicting a realistic post-treatment appearance for a patient (e.g., a smile prediction), which may involve performing dentition registration.
- a smile prediction e.g., a realistic post-treatment appearance for a patient
- computing systems specifically adapted to perform smile prediction for aesthetic or clinical planning purposes are improved.
- aspects of the present disclosure improve the performance of a computing system having a 3D representation of the patient’s dentition by reducing the consumption of computing resources.
- aspects of the present disclosure improve the performance of a computing system which computes cost functions as a part of one or more optimization techniques.
- the cost functions may quantify the differences between two or more dentition masks, and techniques of this disclosure may reduce the number and identities of teeth represented in the dentition masks, and yet may still produce accurate registration results.
- removing teeth that have low-confidence segmentations (e.g., whose identities are know with lower confidence) from the dentition masks can actually increase the accuracy of the registration operation, because potentially erroneous data are removed.
- implementations of the present disclosure must be capable of: 1) storing thousands or millions of mesh elements of the patient’s dentition in a manner that can be processed by a computer processor; 2) performing calculation on thousands or millions of mesh elements, e.g., to quantify aspects of the shape and or/structure of an individual tooth in the 3D representation of the patient’s dentition, 3) generating thousands or millions of successively noisy versions of the training data in a Markov chain, 4) training UNets to perform the denoising operation (which may involve computing and removing noise tensors containing thousands or millions of values); and 5) a realistic smile for a patient, which may involve performing dentition registration, and do so during the course of a short office visit.
- the mesh element feature vectors may provide the encoder with more information about the shape and/or structure of the mesh, and therefore the additional information provided allows the encoder to make better-informed decisions and/or generate more-accurate latent representations of the mesh.
- encoder-decoder structures include U-Nets, autoencoders or transformers (among others).
- a representation generation module may comprise one or more encoder-decoder structures (or portions of encoders-decoder structures - such as individual encoders or individual decoders).
- a representation generation module may generate an information-rich (optionally reduced-dimensionality) representation of the input data, which may be more easily consumed by other generative or discriminative machine learning models.
- An autoencoder may be configured to encode the input data into a latent form.
- An autoencoder may train an encoder to reformat the input data into a reduced-dimensionality latent form in between the encoder and the decoder, and then train a decoder to reconstruct the input data from that latent form of the data.
- a reconstruction error may be computed to quantify the extent to which the reconstructed form of the data differs from the input data.
- the latent form may, in some implementations, be used as an information-rich reduced- dimensionality representation of the input data which may be more easily consumed by other generative or discriminative machine learning models.
- an encoder-decoder structure may first be trained as an autoencoder. In deployment, one or more modifications may be made to the latent form of the input data. This modified latent form may then proceed to be reconstructed by the decoder, yielding a reconstructed form of the input data which differs from the input data in one or more intended aspects. Oral care arguments, such as oral care parameters or oral care metrics may be provided to the encoder, the decoder, or may be used in the modification of the latent form, to influence the encoder-decoder structure in generating a reconstructed form that has desired characteristics (e.g., characteristics which may differ from that of the input data).
- 3D meshes are commonly formed using triangles, but may in other implementations be formed using quadrilaterals, pentagons, or some other n-sided polygon.
- a 3D mesh may be converted to one or more voxelized geometries (i.e., comprising voxels), for example, so that sparse processing can be performed (e.g., using an encoder).
- one feature vector is generated per vertex of the mesh.
- a 3D mesh is a data structure which may describe the geometry or shape of an object related to oral care, including but not limited to a tooth, a hardware element, or a patient’s gum tissue.
- Table 1 discloses non-limiting examples of mesh element features.
- color or other visual cues/identifiers
- a point differs from a vertex in that a point is part of a 3D point cloud, whereas a vertex is part of a 3D mesh and may have incident faces or edges.
- a dihedral angle (which may be expressed in either radians or degrees) may be computed as the angle (e.g., a signed angle) between two connected faces (e.g., two faces which are connected along an edge).
- the neural networks of the present disclosure may embody part or all of a variety of different neural network models. Examples include the U-Net architecture, multi-later perceptron (MLP), transformer, pyramid architecture, recurrent neural network (RNN), autoencoder, variational autoencoder, regularized autoencoder, conditional autoencoder, capsule network, capsule autoencoder, stacked capsule autoencoder, denoising autoencoder, sparse autoencoder, conditional autoencoder, long/short term memory (LSTM), gated recurrent unit (GRU), deep belief network (DBN), deep convolutional network (DCN), deep convolutional inverse graphics network (DCIGN), liquid state machine (LSM), extreme learning machine (ELM), echo state network (ESN), deep residual network (DRN), Kohonen network (KN), neural Turing machine (NTM), or generative adversarial network (GAN).
- U-Net architecture multi-later perceptron (MLP), transformer, pyramid architecture, recurrent
- Non-limiting examples of orthodontic procedure parameters include: Teeth To Move: ⁇ AnteriorsOnly, AnteriorsAndBicuspids, FullArch ⁇ , Spacing: ⁇ CloseAllSpaces, LeaveSpecificSpaces ⁇ , Resolve Lower Crowding by IPR - Posterior Right: ⁇ Primarily, AsNeeded, None ⁇ , or Resolve Lower Crowding by IPR - Posterior Left: ⁇ Primarily, AsNeeded, None ⁇ , among others. Doctor preferences may pertain to a clinician’s past treatment practices, whereas oral care parameters may pertain to the treatment of a particular patient.
- Restoration Design Parameters may, in some implementations, specify at least one value which defines at least one aspect of planned dental restoration treatment for the patient (e.g., specifying desired target attributes of a tooth which is to undergo treatment with a dental restoration appliance).
- Doctor Restoration Design Preferences may, in some implementations, specify at least one typical value for an RDP, which may, in some instances, be derived from past cases which have been treated by one or more oral care practitioners.
- Restoration design parameters (RDP) may be used to encode aspects of smile design guidelines, such as parameters which pertain to the intended dimensions of a restored tooth.
- Nonlimiting examples of restoration design parameters include Tooth width at base (mesial to distal distance) in mm, Overall tooth shape ⁇ rectangular or ovoid, squared edges or rounded edges, etc. ⁇ , Amount of tooth display when the lips are at rest in mm, Tooth morphology - shape style guide ⁇ triangular, oval and rectangular, etc. ⁇ , Tooth morphology - Mamelon grooves ⁇ mamelon_style01, mamelon_style02, mamelon_style03, etc. ⁇ , Tooth morphology - perikymata [perikymata_style01, perikymata_style02, perikymata_style03, etc. ⁇ , among others.
- oral care arguments include doctor preferences (DP), restoration design preferences (RDP), or other types of preferences (e.g., preferences which pertain to the designs or specifications of fixture models or oral care appliances). Doctor preferences, restoration design preferences, or other types of preferences may define the typical treatment choices or practices of a particular clinician. Restoration design preferences are subjective to a particular clinician, and so differ from restoration design parameters. In some implementations, DP, RDP, or other preferences may be computed by unsupervised means, such as clustering, which may determine the typical values that a clinician uses in patient treatment. Oral care arguments may include oral care metrics.
- Oral care metrics may include orthodontic metrics (which may measure physical relationships between two or more teeth), restoration design metrics (which may measure physical relationships between two or more teeth, or may quantify physical aspects of particular tooth), or other types of metrics (which may measure aspects of an existing or generated 3D representation of oral care data).
- Orthodontic metrics may be used to quantify the physical arrangement of an arch of teeth for the purpose of orthodontic treatment or for other oral care treatments (e.g., fixture model generation or appliance component generation). These orthodontic metrics can measure how badly maloccluded the arch is, or conversely the metrics can measure how correctly arranged the teeth are.
- one or more orthodontic metrics may be taken from this section and incorporated into a loss computation, to quantify patterns of errors or deficiencies which may appear in predicted outputs.
- Within- arch orthodontic metrics includes: Alignment - A 3D tooth orientation vector may be calculated using the tooth's mesial-distal axis. Canine Overbite - A distance may be computed between the upper canine and the lower canine on a given side, and/or between the upper pre-molar and the corresponding lower pre-molar. Leveling - The difference in height between two or more neighboring teeth.
- Midline - May compute the position of the midline for the upper incisors and/or the lower incisors, and then may compute the distance between them.
- Overjet The upper and lower central incisors may be compared along the y-axis. The difference along the y- axis may be used as the overjet score.
- the following restoration design metrics may be measured and used in the generation of crowns, dental restoration appliances, veneers (veneers are a type of dental restoration appliance), or the like, with the objective of making the resulting teeth natural looking.
- Symmetry is generally a preferred facet. Shade and translucency may pertain, in particular, to the generation of crowns, though some implementations of dental restoration appliances may also consider this information (e.g., when a succession of dental restoration appliances is used to form nested veneers with at least one inner structure).
- Examples of inter-tooth RDM are enumerated as follows: si) Bilateral Symmetry and/or Ratios: A measure of the symmetry between one or more teeth and one or more other teeth on opposite sides of the dental.
- Proportions of Adjacent Teeth Measure the width proportions of adjacent teeth as measured as a projection along an arch onto a plane (e.g., a plane that is situated in front of the patient's face).
- the ideal proportions for use in the final restoration design can be, for example, the so-called golden proportions.
- Arch Discrepancies A measure of any size discrepancies between the upper arch and lower arch, for example, pertaining to the widths of the teeth, for the purpose of dental restoration.
- Midline A measure of the midline of the maxillary incisors, relative to the midline of the mandibular incisors.
- Proximal Contacts A measure of the size (area, volume, circumference, etc.) of thesproximal contact between adjacent teeth. In the ideal circumstance, the teeth touch along the mesial/distal surfaces and the gums fill in gingivally to where the teeth touch.
- Embrasure In some implementations, techniques of this disclosure may measure the size (area, volume, circumference, etc.) of an embrasure, the gap between teeth at either of the gingival or incisal edge.
- Intra-tooth RDM examples include Intra-tooth RDM, continuing with the numbering of other RDM listed above.
- Length and/or Width A measure of the length of a tooth relative to the width of that tooth. This metric may reveal, for example, that a patient has long central incisors. Width and length are defined as: a) width - mesial to distal distance; b) length - gingival to incisal distance; c) other dimensions of tooth body - the portions of tooth between the gingival region and the incisal edge.
- Tooth Morphology A measure of the primary anatomy of the tooth shape, such as line angles, buccal contours, and/or incisal angles and/or embrasures. The frequency and/or dimensions may be measured.
- Shade and/or Translucency A measure of tooth shade and/or translucency. Tooth shade is often described by the Vita Classical or 3D Master shade guide. Tooth translucency is described by transmittance or a contrast ratio. Tooth shade and translucency may be evaluated (or measured) based on one or more of the following kinds of data pertaining to teeth: the incisal edge, incisal third, body and gingival third. The enamel layer translucency is general higher than the dentin or cementum layer.
- Shade and translucency may, in some implementations, be measured on a per-voxel (local) basis. Shade and translucency may, in some implementations, be measured on a per-area basis, such as an incisal area, tooth body area, etc. Tooth body may pertain to the portions of the tooth between the gingival region and the incisal edge.dlO) Height of Contour: A measure of the contour of a tooth. When viewed from the proximal view, all teeth have a specific contour or shape, moving from the gingival aspect to the incisal. This is referred to as the facial contour of the tooth.
- a patient’s dentition may include one or more 2D representations of the patient's anatomy (including mouth, gums, teeth, and/or full face), or one or more 3D representations of the patient’s teeth (e.g., and/or associated transforms), gums and/or other oral anatomy.
- An orthodontic metric may, in some implementations, quantify the relative positions and/or orientations of at least one 3D representation of a tooth relative to at least one other 3D representation of a tooth.
- a restoration design metric may, in some implementations, quantify at least one aspect of the structure and/or shape of a 3D representation of a tooth.
- An orthodontic landmark may, in some implementations, locate one or more points or other structural regions of interest on a 3D representation of a tooth.
- An OL may, in some implementations, be provided to the generation of an orthodontic or dental appliance, such as a clear tray aligner or a dental restoration appliance.
- a mesh element may, in some implementations, comprise at least one constituent element of a 3D representation of oral care data.
- mesh elements may include at least: vertices, edges, faces and voxels.
- a mesh element feature may, in some implementations, quantify some aspect of a 3D representation in proximity to or in relation with one or more mesh elements, as described elsewhere in this disclosure.
- Orthodontic procedure parameters may, in some implementations, specify at least one value which defines at least one aspect of planned orthodontic treatment for the patient (e.g., specifying desired target attributes of a final setup in final setups prediction).
- Orthodontic Doctor preferences may, in some implementations, specify at least one typical value for an OPP, which may, in some instances, be derived from past cases which have been treated by one or more oral care practitioners.
- Restoration Design Parameters may, in some implementations, specify at least one value which defines at least one aspect of planned dental restoration treatment for the patient (e.g., specifying desired target attributes of a tooth which is to undergo treatment with a dental restoration appliance).
- Doctor Restoration Design Preferences may, in some implementations, specify at least one typical value for an RDP, which may, in some instances, be derived from past cases which have been treated by one or more oral care practitioners.
- Losses may also be used to train encoder structures, or decoder structures, among others.
- a KL-Divergence loss may be used, at least in part, to train one or more of the neural networks of the present disclosure, with the advantage of imparting Gaussian behavior to the optimization space. This Gaussian behavior may enable a reconstruction autoencoder to produce a better reconstruction (e.g., when a latent vector representation is modified and that modified latent vector is reconstructed using a decoder, the resulting reconstruction is more likely to be a valid instance of the provided representation).
- There are other techniques for computing losses which may be described elsewhere in this disclosure. Such losses may be based on quantifying the difference between two or more 3D representations.
- MSE loss calculation may involve the calculation of an average squared distance between two sets, vectors or datasets. MSE may be generally minimized. MSE may be applicable to a regression problem, where the prediction generated by the neural network or other machine learning model may be a real number.
- a neural network may be equipped with one or more linear activation units on the output to generate an MSE prediction.
- MAE loss and MAPE loss can also be used in accordance with the techniques of this disclosure.
- Cross entropy may, in some implementations, be used to quantify the difference between two or more distributions. Cross entropy loss may, in some implementations, be used to train the neural networks of the present disclosure.
- Cross entropy loss may, in some implementations, involve comparing a predicted probability to a ground truth probability. Other names of cross entropy loss include “logarithmic loss,” “logistic loss,” and “log loss”. A small cross entropy loss may indicate a better (e.g., more accurate) model. Cross entropy loss may be logarithmic. Cross entropy loss may, in some implementations, be applied to binary classification problems. In some implementations, a neural network may be equipped with a sigmoid activation unit at the output to generate a probability prediction. In the case of multi-class classifications, cross entropy may also be used.
- a neural network trained to make multi-class predictions may, in some implementations, be equipped with one or more softmax activation functions at the output (e.g., where there is one output node for class that is to be predicted).
- Other loss calculation techniques which may be applied in the training of the neural networks of this disclosure include one or more of: Huber loss, Hinge loss, Categorical hinge loss, cosine similarity, Poisson loss, Logcosh loss, or mean squared logarithmic error loss (MSLE).
- MSLE mean squared logarithmic error loss
- Other loss calculation methods are described herein and may be applied to the training of any of the neural networks described in the present disclosure. Stated another way, in some instances, the forward pass 206 of the denoising diffusion model may generate a training dataset of increasingly noisy examples of the input data.
- the model is capable to generate new 3D oral care representations by passing a noisy data example (e.g., a randomly generated noisy example) through the denoising diffusion process (aka the reverse process).
- the trained denoising ML model 210 is capable to modify an initial 3D oral care representation which is provided to the input of the reverse pass 222, and output a resulting 3D oral care representation which is suitable for use in generating an oral care appliance which is customized to the treatment needs of the patient.
- the denoising ML model 210 of the reverse pass 208 may modify an existing 3D oral care representation (e.g., modify a pre-restoration tooth design).
- the existing 3D oral care representation (e.g., an example of optional instant patient case data 216) can be provided to the input of the reverse pass, and then undergo a succession of denoising steps by the denoising ML model 210, until modifications are complete (e.g., as measured by oral care metrics, loss functions, or a threshold number of iterations has expired, etc.).
- Optional instant patient case data 216 may include data pertaining to the dentition of a patient.
- the instant patient case data may, in some implementations, be introduced to customize the functioning of the denoising diffusion ML model to the anatomy of the patient.
- the instant patient case data may be encoded 218 into a latent or embedded form.
- the instant patient case data may be provided to the denoising diffusion ML model 210.
- the instant data 216 may comprise an appliance component or a fixture model component which requires modification.
- which data structure that undergoes iterative denoising by the denoising ML model 210 may be initialized, at least in part, according to a stochastic process (e.g., by introducing random noise, normally distributed noise, or random configurations), or by aspects of the instant patient case data 216, or by a combination of the two.
- Generated 3D oral care representations 220 may include setups transforms for one or more teeth, transforms for the placement of one or more appliance components (e.g., for the generation of a dental restoration appliance), one or more 3D representations of post-restoration tooth designs, one or more generated (or modified) appliance components (e.g., a parting surface or gingival ribbon for use in generating a dental restoration appliance), one or more generated (or modified) fixture model components (e.g., one or more trimlines, etc.), post-segmentation dental arch mesh(es), post-cleanup dental arch mesh(es), one or more mesh element labels for use in segmentation or mesh cleanup, one or more object masks (e.g., masks to be applied to mesh elements) for use in segmentation or mesh cleanup, one or more coordinate axes for one or more predicted coordinate systems, one or more archforms, or other of the 3D oral care representations described herein.
- appliance components e.g., for the generation of a dental restoration appliance
- a cohort patient case may include a set of tooth crown meshes, a set of tooth root meshes, or a data file containing attributes of the case (e.g., a JSON file).
- a typical example of a cohort patient case may contain up to 32 crown meshes (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces), up to 32 root meshes (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces), multiple gingiva mesh (e.g., which may each contain tens of thousands of vertices or tens of thousands of faces) or one or more JSON files which may each contain tens of thousands of values (e.g., objects, arrays, strings, real values, Boolean values or Null values).
- values e.g., objects, arrays, strings, real values, Boolean values or Null values
- a text encoder may encode a set of natural language instructions from the clinician (e.g., generate a text embedding).
- a text string may comprise tokens.
- An encoder for generating text embeddings may, in some implementations, apply either mean-pooling or max-pooling between the token vectors.
- a transformer e.g., BERT or Siamese BERT
- may be trained to extract embeddings of text for use in digital oral care e.g., by training the transformer on examples of clinical text, such as those given below).
- such a model for generating text embeddings may be trained using transfer learning (e.g., initially trained on another corpus of text, and then receive further training on text related to digital oral care). Some text embeddings may encode text at the word level. Some text embeddings may encode text at the token level.
- a transformer for generating a text embedding may, in some implementations, be trained, at least in part, with a loss calculation which compares predicted outputs to ground truth outputs (e.g., softmax loss, multiple negatives ranking loss, MSE margin loss, cross-entropy loss or the like).
- the non-text arguments such as real values or categorical values, may be converted to text, and subsequently embedded using the techniques described herein.
- the resulting predicted setup may then be integrated with a 2D or 3D representation of the patient's face, resulting in one or more predicted setups.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Veterinary Medicine (AREA)
- Dentistry (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Dental Tools And Instruments Or Auxiliary Dental Instruments (AREA)
Abstract
L'invention concerne des systèmes et des procédés pour générer des prédictions précises de l'anatomie post-traitement d'un patient. Les procédés consistent à recevoir un ou plusieurs arguments de soins buccaux qui décrivent le résultat souhaité à partir d'un modèle d'apprentissage automatique entraîné. Une ou plusieurs représentations bruitées du visage et/ou de la dentition post-traitement du patient peuvent être utilisées pour entraîner un modèle probabiliste de diffusion de débruitage. Lors du déploiement, une image initialement bruitée peut être débruitée, avec un conditionnement sur des représentations latentes de la photo de prétraitement du patient et de la dentition post-restauration. Les procédés peuvent générer une représentation numérique de l'aspect post-traitement du patient. Ces systèmes et procédés permettent une précision de prédiction améliorée et permettent une planification de traitement pour des soins orthodontiques et des traitements restauration dentaire.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363609938P | 2023-12-14 | 2023-12-14 | |
| US63/609,938 | 2023-12-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025126150A1 true WO2025126150A1 (fr) | 2025-06-19 |
Family
ID=94278375
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2024/062651 Pending WO2025126150A1 (fr) | 2023-12-14 | 2024-12-13 | Débruitage de modèles probabilistes de diffusion pour prédiction d'anatomie post-traitement dans des soins buccaux numériques |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025126150A1 (fr) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2020005386A1 (fr) * | 2018-06-29 | 2020-01-02 | Align Technology, Inc. | Présentation d'un résultat simulé de traitement dentaire sur un patient |
-
2024
- 2024-12-13 WO PCT/IB2024/062651 patent/WO2025126150A1/fr active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2020005386A1 (fr) * | 2018-06-29 | 2020-01-02 | Align Technology, Inc. | Présentation d'un résultat simulé de traitement dentaire sur un patient |
Non-Patent Citations (5)
| Title |
|---|
| AMIRHOSSEIN KAZEROUNI ET AL: "Diffusion Models for Medical Image Analysis: A Comprehensive Survey", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 3 June 2023 (2023-06-03), XP091527934 * |
| ANDREAS LUGMAYR ET AL: "RePaint: Inpainting using Denoising Diffusion Probabilistic Models", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 31 August 2022 (2022-08-31), XP091306329 * |
| FEIHONG SHEN ET AL: "OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 December 2022 (2022-12-29), XP091404411 * |
| SAHARIA CHITWAN ET AL: "Palette: Image-to-Image Diffusion Models", PROCEEDINGS OF THE 1ST ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON GEOSPATIAL KNOWLEDGE GRAPHS, ACMPUB27, NEW YORK, NY, USA, 27 July 2022 (2022-07-27), pages 1 - 10, XP058910884, ISBN: 978-1-4503-9741-4, DOI: 10.1145/3528233.3530757 * |
| YULONG DOU ET AL: "3D Structure-guided Network for Tooth Alignment in 2D Photograph", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 17 October 2023 (2023-10-17), XP091638173 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12064311B2 (en) | Visual presentation of gingival line generated based on 3D tooth model | |
| US20250209631A1 (en) | Apparatuses and methods for three-dimensional dental segmentation using dental image data | |
| US20200402647A1 (en) | Dental image processing protocol for dental aligners | |
| EP4633528A1 (fr) | Débruitage de modèles de diffusion pour soins buccaux numériques | |
| EP4634798A1 (fr) | Techniques de réseau neuronal pour la création d'appareils dans des soins buccodentaires numériques | |
| CN119923238A (zh) | 牙科修复体自动化 | |
| WO2024127309A1 (fr) | Autoencodeurs pour configurations finales et étapes intermédiaires d'aligneurs transparents | |
| WO2024127311A1 (fr) | Modèles d'apprentissage automatique pour génération de conception de restauration dentaire | |
| EP4634936A1 (fr) | Apprentissage par renforcement pour configurations finales et organisation intermédiaire dans des aligneurs de plateaux transparents | |
| CN119547153A (zh) | 牙科修复器具的几何形状生成及其几何形状的验证 | |
| WO2024127313A1 (fr) | Calcul et visualisation de métriques dans des soins buccaux numériques | |
| EP4634934A1 (fr) | Apprentissage profond géométrique pour configurations finales et séquençage intermédiaire dans le domaine des aligneurs transparents | |
| EP4633526A1 (fr) | Transformateurs pour configurations finales et stadification intermédiaire dans des aligneurs de plateaux transparents | |
| US20250363269A1 (en) | Fixture Model Validation for Aligners in Digital Orthodontics | |
| WO2025126150A1 (fr) | Débruitage de modèles probabilistes de diffusion pour prédiction d'anatomie post-traitement dans des soins buccaux numériques | |
| EP4540833A1 (fr) | Validation de configurations de dents pour des aligneurs en orthodontie numérique | |
| WO2025257745A1 (fr) | Modèles d'apprentissage automatique utilisant un autocodage spectral dans des soins bucco-dentaires numériques | |
| WO2025074322A1 (fr) | Traitements de restauration orthodontique et dentaire combinés | |
| WO2025126117A1 (fr) | Modèles d'apprentissage automatique pour la prédiction de structures de données se rapportant à une réduction interproximale | |
| EP4633527A1 (fr) | Techniques de transfert de pose pour des représentations de soins bucco-dentaires en 3d | |
| Binvignat et al. | AI in Learning Anatomy and Restoring Central Incisors: A Comparative Study | |
| WO2024127308A1 (fr) | Classification de représentations 3d de soins bucco-dentaires | |
| WO2024127307A1 (fr) | Comparaison de montages pour des montages finaux et la stadification intermédiaire de gouttières d'alignement transparentes | |
| WO2024127314A1 (fr) | Imputation de valeurs de paramètres ou de valeurs métriques dans des soins buccaux numériques | |
| CN120345003A (zh) | 用于验证3d口腔护理表示的自编码器 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24837156 Country of ref document: EP Kind code of ref document: A1 |