System and method for constructing a three-dimensional model of at least one user's dentition
Technical Field
The present invention relates to a system and method for constructing a three-dimensional model of dentition of at least one user in the dental field.
The system and method of the present invention is implemented using passive stereo technology in combination with neural networks, which allows the intraoral imaging device to only need to contain the cameras necessary to accomplish this task, while all data processing takes place outside the intraoral device, thus significantly reducing its complexity and cost compared to prior art solutions.
In this sense, the system of the present invention generally comprises at least one intraoral device to be placed in an oral cavity of at least one user, at least one set of cameras disposed in the at least one intraoral device for capturing at least one stereoscopic image of the oral cavity of the at least one user, and at least one processing medium receiving the at least one stereoscopic image, wherein the at least one processing medium comprises at least one trained neural network that analyzes the at least one stereoscopic image to estimate at least one depth map, and wherein the at least one processing medium further comprises at least one localization and mapping module that sequentially integrates the at least one stereoscopic image and the at least one depth map into a generated three-dimensional model.
In another aspect, the method of the present invention generally includes the steps of capturing at least one stereoscopic image via at least one set of cameras disposed in at least one intraoral device, receiving the at least one stereoscopic image by at least one processing medium, analyzing the at least one stereoscopic image via at least one trained neural network contained by the at least one processing medium to estimate at least one depth map, and integrating the at least one stereoscopic image and the at least one depth map sequentially into a generated three-dimensional model via at least one localization and mapping module contained by the at least one processing medium.
Not only can the accessibility of such techniques to dental patients be improved by simplifying the physical equipment for these procedures, but a more accurate solution is provided based on the systems and methods of the present invention than with conventional modeling techniques.
Background
In dentistry and its various specialized fields, the use of techniques capable of modeling the dentition of a patient is a widely spread practice, since the modeling process is fast and simple.
Most of these solutions contemplate the use of an intraoral device or scanner comprising at least one camera, using known three-dimensional modeling principles such as confocal microscopy, active stereo or structured light, combined with algorithms that generally perform the processing within the device itself.
In this sense, confocal microscopy is the most complex technique in terms of hardware usage, and therefore more costly to manufacture and implement. This technique estimates the depth value of an image by changing the focus of the light source and filtering out-of-focus light using a lens. For each focus setting, the camera receives light from the focused surface area. Since the focal point of the light is controllable and the geometry of the lens is known, there is a relationship whereby the depth value of the focal region can be estimated.
On the other hand, scanners using active stereo technology capture image pairs through stereo cameras while projecting patterns onto a surface. Then, the algorithm calculates the correspondence between the two images using the projection pattern. The technology requires the provision of a pattern projector within the scanner or intraoral device, which also makes its structure more complex.
Finally, scanners using structured light technology project a known light pattern onto a surface and capture an image with a camera. The algorithm then deduces the topology and depth of such a surface by observing how the light pattern on the surface is deformed. Also, scanners using this technology must have additional hardware, which makes them more costly to manufacture.
Furthermore, since these solutions are mostly handled within the intraoral scanner, great care must be taken that damage to the equipment may occur, as the possible maintenance may result in high costs.
There is therefore an increasing need for a system and method that not only simplifies the construction of scanners or intraoral devices by using fewer components and separating the data acquisition and processing of the intraoral devices, but also that can significantly improve the quality of three-dimensional modeling, as there is still room in the art for improvement in this regard.
In the patent field, there are some solutions directed to a dental field dentition three-dimensional modeling apparatus or system. For example, US patent application US20140146142A1 describes a three-dimensional measurement device intended for measurement without structural or active light projection, comprising an image acquisition device and a data processor for the image. The image acquisition device is capable of capturing at least two images simultaneously or nearly simultaneously, one of which is fully or partially contained in the other. The contained images describe a narrower field of view than the other image and have a higher accuracy than the other image.
In this sense, although document US20140146142A1 describes a device with at least one camera that processes information outside the device, in particular on a dedicated computer, it does not mention the use of a neural network trained in advance with dentition models during the processing of the information, in which only the use of several algorithms at different stages of the processing is mentioned. Since the processing in document US20140146142A1 is performed on the computer of the professional performing the operation, the algorithm used cannot be too complex, as this would require a computer with high processing power, which would make the device usable only by a small number of people. Thus, this document fails to provide an algorithm that can achieve results superior to the state of the art, as its processing power will be limited by the processing power of the user's computer, which is not present in the present invention, as the processing medium is located outside the intraoral device (which may be hosted in the cloud), the processing power can be significantly improved by using, for example, a neural network trained with a large number of parameters, thereby significantly improving the results of the three-dimensional modeling.
Another example is disclosed in international patent application WO2021250091A1, which describes a method of automatic segmentation of a dental arch, comprising obtaining a three-dimensional surface of the dental arch to obtain a three-dimensional representation containing a set of vertices, generating a virtual view from the three-dimensional representation, projecting the three-dimensional representation onto each two-dimensional virtual view to obtain images representing each vertex in the virtual view, processing each image using a deep learning network, backprojecting each image to assign pixels appearing and corresponding in each image to each vertex in the three-dimensional representation, and assigning each vertex one or more probability vectors to determine the class of dental tissue to which each vertex belongs.
When comparing the description of document WO2021250091A1 with the present application, it can be observed that, although said document describes an intraoral device for image processing using trained neural network technology, the algorithm is arranged within a processing module in the intraoral device, which greatly limits the processing power of the device and thus the complexity of the neural network used. This does not occur in the present application, where all processing is done outside the intraoral device, thus not limiting processing power and improving the quality of three-dimensional modeling, allowing a reconstruction of a three-dimensional model to be obtained in virtually real time. Furthermore, the structure of the intraoral device is simplified, since it has only the camera and related circuitry required to acquire an image.
It can be seen from the above document that the object of most such devices and systems is not to simplify the structure of the intraoral device, thus saving on economic costs, but to solve other types of problems, such as avoiding the use of structured light. Although solutions exist for performing data processing outside the intraoral device or for image analysis using a neural network, these solutions are far from those described in the present invention, which, in addition to providing a simplified structure and using the device, allow to perform the processing on a server located in the cloud, which, together with a device not requiring the user to have on-site processing capabilities, allow to significantly improve the quality of the three-dimensional modeling of the dentition, since more complex processing algorithms can be used in combination with highly parameterized, trained neural networks for analyzing the received images in order to determine the corresponding depth map.
Accordingly, there is a need to provide a system and method that not only provides a simpler, less costly intraoral device, but also improves the quality and speed of three-dimensional modeling of a patient's dentition so that the dentist responsible for that operation can obtain results in real time. Furthermore, there is a need for a solution that allows avoiding the use of complex and costly equipment for data processing, the present invention receiving information obtained by an intraoral device directly from the intraoral device (or via a computer or similar electronic device as a relay) by a server, and viewing the result, thereby achieving the above object. This and other advantages associated with other aspects of the technology are described in more detail below.
Disclosure of Invention
The present invention relates to a system and method for constructing in real time a three-dimensional model of the dentition of at least one user, which simplifies the structure of the intraoral device used, while improving the quality and speed of the three-dimensional modeling.
According to a first preferred embodiment of the present invention, a system for constructing a three-dimensional model of a dentition of at least one user comprises:
-at least one intraoral device placed in the mouth of at least one user;
At least one set of cameras arranged in said at least one intraoral device for capturing at least one stereoscopic image of an oral cavity of said at least one user, and
-At least one processing medium receiving at least one stereoscopic image;
Wherein the at least one processing medium comprises at least one trained neural network that analyzes the at least one stereoscopic image to estimate at least one depth map, and
Wherein the at least one processing medium further comprises at least one localization and mapping module that sequentially integrates the at least one stereoscopic image and the at least one depth map into the generated three-dimensional model.
The system of the present invention operates according to passive stereoscopic techniques. This means that it uses a pair of synchronized images and algorithms using a trained neural network that can estimate its depth value without projecting anything to the scanned surface, unlike common prior art solutions.
In scenes with little texture and/or reflection, such as a patient's mouth, conventional passive stereoscopic algorithms often perform poorly. In these cases, the poor performance is caused by conventional algorithms that find keypoints in each image and then attempt to match the keypoints in one image with the keypoints in the other image (this is called stereo matching). In the case of little texture and/or reflection, the key points are very blurred and the algorithm makes many mistakes in processing.
The present invention uses an algorithm that is enhanced by using a highly complex neural network, rather than the conventional stereo matching algorithm. Such a neural network estimates depth values of images without using explicit keypoints and achieves much higher accuracy than conventional stereo matching algorithms. To achieve this accuracy, the neural network must be trained with highly realistic composite images and depth values.
In this sense, by using the neural network, the algorithm of the present application requires high processing power. In order to avoid making the intraoral device more complex and thus more expensive, it is decided to perform the process outside the device, or in an external electronic equipment fitted to it, or in a server provided in the cloud. Based on the above, due to the operation of the neural network, it is possible to solve two problems proposed by the present application, such as simplifying the hardware used, while improving the quality of the three-dimensional modeling of the patient's dentition that can be obtained in real time by the user.
According to another embodiment of the invention, the at least one processing medium further comprises at least one post-processing module that removes at least one noise depth value from the three-dimensional model and recalculates the pose of the at least one set of cameras. The post-processing is performed using all information captured during the scan performed by the intraoral device. The recalculation of the pose of at least one set of cameras shows that as images are captured and depth values estimated, the system of the present invention estimates where at least one set of cameras is located. This is done to know how to combine the different depth values. When estimating the position of at least one set of cameras, cumulative errors are also generated, which have to be corrected in the post-processing stage.
According to another embodiment of the invention, the system further comprises at least one receiving device which receives at least one stereoscopic image from at least one set of cameras and transmits it to at least one processing medium.
According to another embodiment of the present invention, the at least one receiving device is at least one of a computer, a notebook computer, a tablet computer, and a smart phone.
According to a further embodiment of the invention, at least one processing medium is provided in at least one receiving device. This allows users to perform data processing on their own equipment, provided they possess the necessary processing power.
According to another embodiment of the invention, the at least one receiving device comprises at least one display interface. The display interface allows the user to see in real time the final three-dimensional model of the dentition and/or images acquired by the intraoral device.
According to another embodiment of the invention, the at least one processing medium is arranged in the cloud. This enables users that do not have sufficient processing power to carry the processing medium and neural network to choose to send information collected by the intraoral device directly to the cloud server or to link to a receiving device (which, as previously described, may be a computer, tablet device, etc.).
In addition, as the processing medium and the neural network are located in the cloud, the system can analyze information from a plurality of intraoral devices at the same time and return corresponding three-dimensional models in real time. By this function, the cost of the system can be further reduced, since there is no need to equip each running intraoral device with a processing medium, unlike the usual solutions.
According to another embodiment of the invention, the at least one set of cameras comprises at least one first camera and at least one second camera. This allows to obtain the pair of synchronous images required by the algorithm and the neural network to correctly estimate the depth of the scan surface.
According to another embodiment of the invention, at least one intraoral device is in wireless communication with at least one processing medium.
According to another embodiment of the invention, at least one intraoral device communicates with at least one processing medium via a communication cable.
According to another embodiment of the invention, at least one intraoral device is in wireless communication with at least one receiving device.
According to another embodiment of the invention, the at least one intraoral device communicates with the at least one receiving device via a communication cable.
According to another embodiment of the invention, the at least one receiving device is in wireless communication with the at least one cloud.
According to another embodiment of the invention, the at least one intraoral device further comprises at least one battery that allows the intraoral device to operate without direct connection to a power source (e.g., plug).
In another aspect, there is also described in accordance with a second preferred embodiment of the present invention a method for constructing a three-dimensional model of at least one user's dentition, comprising the steps of:
a) Capturing at least one stereoscopic image by at least one set of cameras disposed in at least one intraoral device;
b) Receiving at least one stereoscopic image through at least one processing medium;
c) Analyzing the at least one stereoscopic image by at least one trained neural network included in at least one processing medium to estimate at least one depth map, and
D) The at least one stereoscopic image and the at least one depth map are sequentially integrated into the generated three-dimensional model by at least one localization and mapping module included in the at least one processing medium.
As described above, when capturing an image, the intraoral device starts transmitting a stereoscopic image in real time to a linking device (such as a computer), and then the linking device transmits the information to a cloud server where a processing medium is located, or directly transmits the image to the processing medium arranged in the computer. Once the images are available for analysis by the processing medium, the neural network trained for that task analyzes each stereoscopic image sent and estimates a depth map from them.
The depth map and the stereoscopic image are integrated together into the reconstruction generated at the current moment by means of at least one localization and mapping module which compares the information it receives with the partial reconstruction of the scene and predicts the camera pose of the stereoscopic image.
According to another embodiment of the invention, the method further comprises removing at least one noise depth value from the three-dimensional model by at least one post-processing module comprised in the at least one processing medium and recalculating the pose of the at least one set of cameras.
According to another embodiment of the invention, the method further comprises, prior to step c), receiving the at least one stereoscopic image from the at least one set of cameras by at least one receiving device, and subsequently transmitting it to the at least one processing medium.
According to another embodiment of the invention, the method further comprises displaying information transmitted and received from the at least one processing medium via at least one display interface provided on the at least one receiving device.
According to another embodiment of the invention, the method further comprises generating a three-dimensional model of the dentition of the at least one user in real time and transmitting it to the at least one display interface.
Finally, there is also described in accordance with a third preferred embodiment of the present invention a computer readable storage medium comprising instructions which, when executed by at least one processor, cause the at least one processor to perform a method for constructing a three-dimensional model of a dentition of at least one user.
From the above it can be seen that the important difference between the present invention and the prior art solutions is related to the fact that the present system does not employ active stereo, structured light or confocal microscopy, etc. techniques, but uses a powerful neural model that allows it to estimate depth values using only passive stereo techniques. This means that the sensors of the intraoral devices are rather uncomplicated and expensive, since they only comprise at least one common camera and one circuit for synchronizing them.
Furthermore, depth estimation from stereo images as well as reconstruction processes and post-processing may be performed on servers disposed in the cloud, with the processing medium being located with the trained neural network. In this case, the dentist's computer only sends information obtained through the intraoral device to the data cloud and allows the reconstructed state to be viewed in a real-time display interface so that the dentist can control the process. In contrast, conventional scanners perform reconstruction and post-processing on the dentist's own computer, which requires the computer to have sufficiently powerful hardware to accommodate the processing algorithms used, where the algorithms used by these solutions must generally be adapted to this type of computer, which directly compromises the quality of the obtained three-dimensional modeling, which does not occur in the present invention, where the quality of modeling is improved due to the fact that there is no such limitation in terms of processing capacity.
Finally, none of the prior art solutions provides the possibility to analyze data sent by a plurality of intraoral devices using a single processing medium, thus optimizing the use of resources, reducing the costs associated with the implementation of the system, making this type of technique easier for dentists and patients.
Drawings
As part of the present invention, the following representative figures are presented which illustrate preferred configurations of the present invention and, therefore, should not be construed as limiting the claimed subject matter.
Fig. 1 shows a block diagram of an intraoral scanning procedure according to prior art.
Fig. 2 shows the general scheme of a passive stereoscopic technique according to a preferred arrangement of the present invention.
Fig. 3 shows a block diagram of an intraoral scanning procedure according to a preferred arrangement of the present invention.
Detailed Description
Referring to the drawings, FIG. 1 shows a block diagram of a conventional intraoral scanning process for acquiring three-dimensional images of a patient's dentition. In particular, a first phase (1) of data capture and reconstruction consisting of two sub-phases is observed. The first sub-stage (la) is directed to the actual scan performed in the mouth of the patient by an expert, such as a dentist. This is done by a physical device inserted into the user's mouth, the physical device comprising at least one data capturing device. During scanning, the scanners transmit their captured data in real time to the dentist's computer, where these technical scanners typically work under the theoretical principles of confocal microscopy or structured light (active stereo), which are typically measurements of images, depth and Inertial Measurement Units (IMUs) including accelerometers and gyroscopes to measure angular velocity and acceleration. Once the information obtained by the scanner is sent to the expert's computer, the computer integrates the images, depth values and other measured values sequentially to estimate the pose of the camera in each image, thus constructing a three-dimensional model (sub-stage (1 b)). In this way, a first three-dimensional model of the patient's dentition is obtained, which is still inaccurate and unclean (block (2)).
In view of the fact that the three-dimensional model obtained in the first stage is not suitable for dental treatment, a post-treatment stage (3) is required in which the first three-dimensional model is cleaned by removing noise points and non-dentition correspondence points (sub-stage (3 a)). Thereafter, the dentist's computer recalculates the reconstruction (sub-stage (3 b)) using all the information received during the scan. Finally, the camera pose and depth values are optimized to minimize the re-projection errors (sub-stage (3 c)), which correspond to the differences between the captured image and the image generated from the three-dimensional model reconstruction, after which a corrected and cleaned three-dimensional model is obtained (block (4)), which allows the expert to use it to develop a specific treatment for the patient (block (5)), where we can mention stealth orthotics, relaxed occlusal planes, crowns, etc.
It should be noted that these prior art techniques require that the expert is equipped with a computer (10) with high resource processing capabilities in view of the number of images to be processed and is able to execute the processing algorithms required for generating the three-dimensional model, wherein at least the stages described in blocks (1), (2) and (3) have to be performed.
On the other hand, fig. 2 shows the general scheme of passive stereoscopic technology used in the present invention, where it can be understood how the scanner or intraoral device (30) of the technology operates. In this sense, in the embodiment shown in fig. 2, the intraoral device (30) comprises left and right cameras (12 a, 12 b) located on a base line (11), wherein the cameras (12 a, 12 b) in turn comprise left and right lenses (13 a, 13 b), respectively. The left and right cameras (12 a, 12 b) may be placed parallel to the base line (11) or at an angle thereto.
Each camera (12 a, 12 b) of the intraoral device (30) forms an image plane (14 a, 14 b) by which real points (15, 16) are displayed in each image obtained by the camera (12 a, 12 b). As previously described, the intraoral device (30) uses passive stereo techniques to estimate the depth value (17) of the scanned surface by pairing synchronized images and algorithms that process the images without the need to project anything on the images, as occurs in other prior art solutions to this problem.
This is extremely relevant in scenes with little texture and/or reflection, such as in the mouth of a patient, where conventional passive stereo algorithms tend to work poorly because they search for keypoints in each image and then attempt to correlate (stereo match) those keypoints of one image with those of another image. Since the surface inside the mouth is mostly almost free of texture and/or reflections, the key points are very blurred and difficult to locate, so these traditional algorithms can make many mistakes, giving inaccurate results.
The present invention replaces the traditional stereo matching algorithm by using a neural network, without locating explicit key points for the computation of image depth values, which improves the accuracy in low texture and/or reflection situations, for example in the patient's mouth. To achieve this accuracy, the neural network must be trained with highly realistic composite images and depth values.
Finally, fig. 3 shows a block diagram of an intraoral scanning procedure in accordance with a preferred embodiment of the present invention. In particular, a first data capture phase (100) is shown, which is divided into two sub-phases. The purpose of the first sub-stage is to capture pairs of synchronized images (stereoscopic images) by the expert through a scanner or intraoral device, and then transmit them to his computer (sub-stage (100 a)) either wirelessly or through a data cable. The computer then transmits the stereoscopic image to a server or cloud in real time (substep (100 b)). The intraoral device preferably consists of two cameras and circuitry to synchronize the capturing of the two cameras, wherein the depth values are calculated from these images by a neural network, preferably located in the same cloud that receives the images from the intraoral device. As a result of the first data capture stage (100), a stereoscopic image stream (block (200)) is obtained from the intraoral device and then transmitted to the expert's computer and then to the cloud.
This is an important difference between the prior art and the present invention, since it can be seen that the system of the present invention only requires the use of an expert's computer (10) during this data capture phase (100), after which the information is processed in the cloud, where the processing algorithm is deposited with the neural network. This eliminates the need for the expert to have a computer with high processing power, and he even only has electronic devices that can connect to the cloud, such as a tablet or smart phone.
After this first stage (100) of data capturing and sending information to the cloud, where a stereoscopic image stream (200) is obtained, a reconstruction stage (300) is performed in the cloud, which comprises a depth map calculation sub-stage (300 a) for each pair of images based on using a neural network, and a sequential integration sub-stage (300 b) of the three-dimensional reconstruction of the depth map and the images. The neural network compares the two images (e.g., RGB) and estimates a depth value for each pixel based on the relative motion of the object between the two images. In contrast, conventional scanners use confocal microscopy or structured light to estimate depth values, which is less accurate and requires a significant texture on the surface. As a result of this reconstruction stage (300) in the cloud, an inaccurate and unclean three-dimensional model is obtained (block (400)).
Once the first three-dimensional model (400) is obtained, the post-processing stage (500) must be performed in the cloud, which includes three sub-stages. The purpose of the first sub-stage is to clean the three-dimensional reconstruction (500 a) by removing noise points or points that do not correspond to dentition, after which the processing medium recalculates the reconstruction (500 b) using all the information received during the scanning and data capture stage (100). Finally, the camera pose and depth values (500 c) are optimized to minimize the re-projection error, which corresponds to the difference between the captured image and the image generated from the reconstruction, as described above.
As a product of these two reconstruction and post-processing stages in the cloud, a corrected and cleaned three-dimensional model is obtained (block (600)) with higher accuracy than the prior art, wherein an expert can download the three-dimensional model generated in the cloud for dentition treatment (700), such as a stealth appliance, a relaxed occlusal plane, a crown, etc.
In this sense, it is important to emphasize the importance of the reconstruction stage (300) and the post-processing stage (500) in the cloud (20), the processing medium being located in the cloud (20), comprising trained neural networks and algorithms constituting a localization and mapping module that integrates at least one stereo image and at least one depth map sequentially into the generated three-dimensional model. As described above, this not only avoids the need for an expert to have equipment that includes high processing power, but also allows for a processing tool in the cloud that has higher power than that used in the prior art, which is limited by the processing power of the intraoral equipment or the expert's computer. This results in a solution where the expert receives the results with corrected and cleaned three-dimensional models in real time thanks to a neural network with high processing power and allowing more accurate results to be obtained in a shorter time.
Finally, the possibility of the system of the present invention operating with multiple intraoral devices through a single cloud is also emphasized, which can significantly reduce the costs associated with the implementation of the system, making it available to most specialists, requiring only one treatment medium, whereas in prior art solutions the cost of the entire treatment system must be paid each time the specialist purchases the product.
Reference numerals
1 Data Capture and reconstruction
1A are scanned by an expert and data are sent to a computer
Reconstruction of 1b three-dimensional models
2 Inaccurate and unclean three-dimensional model
3 Post-treatment
3A three-dimensional model with inaccurate and unclean cleaning
3B recalculating the reconstruction
Optimization of 3c camera pose and depth values
4 Corrected and cleaned three-dimensional model
5 Treatment
10 Expert's computer
11 Baseline
12A left camera
12B right camera
13A left lens
13B Right lens
14A left image plane
14B right image plane
15. 16 True points
17 Depth value
20 Cloud
30-Port internal device
100 Data capture
100A are scanned by an expert and data is sent to a computer
100B real-time stereoscopic image transmission to cloud
200 Stereoscopic image stream
300 Cloud reconstruction
300A depth map calculation
300B sequential integration into three-dimensional reconstruction
400 Inaccurate and unclean three-dimensional model
Post-processing in 500 cloud
500A cleaning inaccurate and unclean three-dimensional model
500B recalculating the reconstruction
500C optimization of camera pose and depth values
600 Corrected and cleaned three-dimensional model
700 Treatment