WO2025189245A1 - Machine learning for three-dimensional mesh generation based on images - Google Patents
Machine learning for three-dimensional mesh generation based on imagesInfo
- Publication number
- WO2025189245A1 WO2025189245A1 PCT/AU2025/050236 AU2025050236W WO2025189245A1 WO 2025189245 A1 WO2025189245 A1 WO 2025189245A1 AU 2025050236 W AU2025050236 W AU 2025050236W WO 2025189245 A1 WO2025189245 A1 WO 2025189245A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- image
- measurement system
- head
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/20—Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61M—DEVICES FOR INTRODUCING MEDIA INTO, OR ONTO, THE BODY; DEVICES FOR TRANSDUCING BODY MEDIA OR FOR TAKING MEDIA FROM THE BODY; DEVICES FOR PRODUCING OR ENDING SLEEP OR STUPOR
- A61M16/00—Devices for influencing the respiratory system of patients by gas treatment, e.g. ventilators; Tracheal tubes
- A61M16/06—Respiratory or anaesthetic masks
- A61M2016/0661—Respiratory or anaesthetic masks with customised shape
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/41—Medical
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2008—Assembling, disassembling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2016—Rotation, translation, scaling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2021—Shape modification
Definitions
- Embodiments of the present disclosure generally relate to computer vision and machine learning. More specifically, embodiments relate to using machine learning to generate three-dimensional meshes based on image data.
- CPAP continuous positive airway pressure
- a method includes: accessing a set of two-dimensional images of a user; generating, based on processing the set of two-dimensional images using a first machine learning model, a three- dimensional mesh depicting a head of the user, wherein the three-dimensional mesh is scaled to a size of the head of the user; modifying the three-dimensional mesh to remove one or more facial expressions; determining a set of facial measurements based on the modified three- dimensional mesh; and selecting a user interface for the user based on the set of facial measurements.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
- FIG. 1 depicts an example workflow for generating meshes and selecting user interfaces, according to some embodiments of the present disclosure.
- FIG. 2 depicts an example workflow to facilitate image data collection for improved mesh generation, according to some embodiments of the present disclosure.
- FIG. 3 depicts an example workflow for improved interface selection based on generated meshes, according to some embodiments of the present disclosure.
- FIG. 4 is a flow diagram depicting a method for using machine learning model(s) to generate three-dimensional meshes and select interface components, according to some embodiments of the present disclosure.
- FIG. 5 is a flow diagram depicting a method for collecting and evaluating image data, according to some embodiments of the present disclosure.
- FIGS. 6A and 6B illustrate a flow diagram depicting a method for collecting user image data, according to some embodiments of the present disclosure.
- FIG. 7 depicts an example user interface for collecting user image data, according to some embodiments of the present disclosure.
- FIG. 8 is a flow diagram depicting a method for improved interface selection, according to some embodiments of the present disclosure.
- FIG. 9 is a flow diagram depicting a method for using machine learning to select user interfaces, according to some embodiments of the present disclosure.
- FIG. 10 depicts an example computing device configured to perform various embodiments of the present disclosure.
- identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.
- Embodiments of the present disclosure provide apparatuses, methods, processing systems, and computer-readable mediums for machine learning (ML)-based interface selections based on three-dimensional meshes.
- ML machine learning
- a measurement system may be configured to evaluate a set of images captured via an imaging sensor (e.g., a webcam) using machine learning to generate a three-dimensional mesh corresponding to the face of a user depicted in the image(s).
- an imaging sensor e.g., a webcam
- such meshes may be used to generate accurate biometric measurements.
- the system can use two-dimensional images to generate the three-dimensional mesh, and need not rely on complex three- dimensional imaging systems. This enables the system to be implemented using a wide range of widely available imaging sensors, including web cameras, mobile device cameras, and the like.
- a deep learning model is trained end-to-end to generate three- dimensional meshes having appropriate size or scale, relative to the user’s face. That is, the mesh may be scaled according to the size of the user’s face, ensuring that measurements taken based on the mesh (e.g., nose size, mouth position, and the like) are accurate.
- the user may directed to move a body part, such as their face, through a range of motion while an imaging sensor captures images of the body part at different angles. Trained machine learning models may then be used to evaluate the image(s) to ensure they are satisfactory (e.g., to ensure the user’s ear is visible in one or more pictures) to ensure accurate mesh generation.
- the measurement system may perform various operations using the generated mesh.
- the measurement system may morph or deform the mesh to remove facial expressions, if any, depicted in the image. For example, if the user is smiling, raising their eyebrows, or making some other expression, the system may modify the mesh to remove such express! on(s).
- a wide variety of facial expressions may result in inaccurate facial measurements (e g., inaccurate nostril size prediction due to deformation of the user’s skin when they smile). Therefore, by manipulating the three-dimensional mesh to remove such expressions, the measurement system can ensure that the captured measurement data is highly accurate.
- the mesh can be used to estimate, calculate, compute, or otherwise determine a set of facial measurements of the user.
- the particular measurements collected may vary depending on the particular implementation and task.
- the measurement system generates the facial measurements to facilitate selecting and/or fitting of one or more devices or components designed to be worn on the head or face of the user, such as user interfaces (e.g., masks) for respiratory therapy (e.g., CPAP).
- user interfaces e.g., masks
- respiratory therapy e.g., CPAP
- the measurement system may determine measurements such as the face height, nose width, nose depth, nostril size, and the like.
- the particular facial measurements that are determined and used may vary depending on the particular task (e.g., to allow determination of proper sizing for conduits (e.g., for tube-up masks), head gear, nostril sizes for pillow masks, and the like).
- these measurements can be used to select, design, customize, or otherwise retrieve a facial device or user interface for the user, such as an appropriately-fitted mask for the user, to ensure functionality, comfort, and stability.
- FIG. 1 depicts an example workflow 100 for generating meshes and selecting user interfaces, according to some embodiments of the present disclosure.
- a measurement system 110 accesses a set of image(s) 105 and generates or selects an interface 150 (e g. a recommendation or selection of the interface 150) based on the image(s) 105.
- “accessing” data may generally include receiving, retrieving, requesting, obtaining, collecting, capturing, measuring, or otherwise gaining access to the data.
- the image(s) 105 may be captured via one or more imaging sensors (e.g., a webcam or a camera on a smartphone) and may be transmitted to the measurement system 110 via one or more communication links.
- the measurement system 110 is implemented as a cloud-based service that evaluates user images 105 to generate the interfaces 150. For example, users may use an application to capture the image(s) 105 on their local devices (e.g., the user’s laptop or phone), and may then upload the image(s) 105 to the measurement system 110 for evaluation.
- the measurement system 110 may be implemented using hardware, software, or a combination of hardware and software. Further, though illustrated as a discrete system for conceptual clarity, in some embodiments, the operations of the measurement system 110 may be combined or distributed across any number and variety of devices and systems.
- the image(s) 105 generally correspond to two-dimensional images that depict the head and/or face of the user.
- the image(s) 105 depict the user from multiple angles or orientations
- the image(s) 105 may include a frontal image (e.g., captured while the user’s face is angled directly towards the imaging sensor, such that the image depicts the face of the user from straight on), one or more side or profile images (e g , captured while the user turned their face towards the left and/or right side of the imaging sensor, such that the image(s) depict the side of the user’s face and/or the user’s ear(s)), a bottom image (e.g., captured while the user looked upwards relative to the imaging sensor, such that the image depicts the user’ s chin, neck, and/or nostrils), and/or a top image (e.g., captured while the user looked downwards relative to the imaging sensor, such that the image depicts the top of the user
- the measurement system 110 may receive various other data, such as metadata associated with one or more images 105 (e.g., indicating characteristics such as the field of view (FOV) or focal length of the camera that captured the image(s) 105).
- metadata associated with one or more images 105 (e.g., indicating characteristics such as the field of view (FOV) or focal length of the camera that captured the image(s) 105).
- the image(s) 105 are accessed by an image component 115.
- the image component 115 generally facilitates collection and evaluation of the images 105.
- the particular operations performed by the image component 115 may vary depending on the particular implementation.
- the image component 115 may perform various preprocessing operations, such as to enhance contrast, reduce noise, resize the images, crop the images, perform color correction on the images, perform feature extraction, and the like.
- the image component 115 may evaluate one or more of the image(s) 105 to confirm that the image(s) 105 are suitable for mesh generation.
- the image component 115 may use one or more machine learning models to detect the presence (or absence) of various facial features or landmarks that are useful in mesh generation, such as the ear(s) of the user in the profile image(s). In some embodiments, if such landmarks are not visible, the image component 115 may request additional image(s) 105 to improve the measurement process.
- the image component 115 provides image data 120 to a mesh component 125.
- the image data 120 may correspond to or comprise the image(s) 105 themselves, and/or may correspond to the image(s) 105 after preprocessing operation(s) are applied, such as noise reduction or resizing.
- the image data 120 corresponds to or comprises the results of feature extraction. That is, the image data 120 may comprise feature map(s) generated for the image(s) 105.
- the mesh component 125 processes the image data 120 to generate a mesh 130.
- the mesh component 125 uses one or more machine learning models to generate the mesh 130.
- the mesh component 125 may use a deep learning model (e.g., a convolutional neural network) to generate the mesh 130.
- the mesh component 125 uses the machine learning model(s) to fit a statistical shape model representing a statistically average face to the image data 120, causing the mesh 130 to depict or correspond to the face of the user.
- the mesh component 125 uses a camera model to scale the mesh 130. For example, in some embodiments, the mesh component 125 may determine the FOV of the camera used to capture the image(s) 105 (e.g., from metadata associated with the images 105, and/or by processing the images themselves). In some embodiments, the perceived size of various facial landmarks may change as the landmarks move closer to or further from the camera. Based on the perceived changes in size of the landmark(s) (e.g., the user’s head, or more granular landmarks such as eyes or ears), in some embodiments, the mesh component 125 can use a camera model to determine or infer the FOV of the camera and/or the distance between the camera and the landmark(s).
- the mesh component 125 can use a camera model to determine or infer the FOV of the camera and/or the distance between the camera and the landmark(s).
- the mesh component 125 can determine the scale of the face or features therein. For example, after determining that the user’s nose is N millimeters away from the camera and that the FOV of the camera is X degrees, the mesh component 125 may determine the actual size of the user’s nose (e.g., in millimeters).
- the mesh component 125 may use a deep learning model that generates an appropriately scaled mesh 130.
- the model may be trained based on facial exemplars to generate the mesh 130 in a way that inherently understands the scale of the face, without using a separate camera model (e.g., without explicitly determining or evaluating the FOV of the camera, for example).
- the measurement system 110 or another system may use relatively dense exemplars, such as images and corresponding meshes or dense coordinates of facial landmarks.
- some or all of the training data comprises synthetic data.
- accurate three-dimensional meshes or models of synthetic heads and/or faces may be generated using various computer programs (e.g., models of people that are not real individuals or users, but where the models are nevertheless realistic).
- the measurement system 110 (or another system) may render image(s) depicting the modeled head from various angles (e.g., by placing a virtual camera in the virtual space at various positions around the head). These images may be used as the training input to the model, paired with some or all of the mesh itself (e.g., data points in three-dimensional space, such as defining various landmarks of the face) used as the target output.
- the training data may include real data (e.g., real images of a user, coupled with highly accurate three-dimensional data points).
- real data e.g., real images of a user, coupled with highly accurate three-dimensional data points.
- users may volunteer to use a scanning device capable of capturing image data and three-dimensional positioning data for their face.
- the mesh 130 is a three-dimensional mesh representing at least a portion of the user’s head and face.
- the mesh 130 may depict the user’s face, a portion of their neck, and/or a portion of their head (e.g., including the ears).
- the mesh 130 is accessed by a measurement component 135.
- the measurement component 135 generates a set of measurements 140 based on the mesh 130.
- the measurement component 135 may apply one or more preprocessing operations. For example, the measurement component 135 may morph or deform the mesh 130 to remove the facial expression(s) of the user, if any, resulting in a mesh that reflects the face of the user in a neutral expression.
- the particular measurements captured by the measurement component 135 may vary depending on the particular implementation and task.
- the measurement component 135 may evaluate the mesh 130 to determine features such as the nose width, nose height, and/or nose depth of the user. Such measurements may be useful to select or provide facial devices such as a face mask (e.g., a respiratory therapy mask) that covers the nose of the user.
- the measurement component 135 may measure the height and/or width of the user’s mouth, and/or the positioning of the mouth relative to the nose, for similar reasons.
- the measurement component 135 may determine the overall size of the user’s head (e.g., the circumference of the user’s head), which may be a useful metric for conduit and/or headgear sizing (e.g., to select a conduit that is sufficiently large to comfortably reach around the user’ s head without being too large such that it is uncomfortable, and/or to select headgear that will fit comfortably).
- headgear refers to the straps, bands, or other components used to secure the user interface to the user’s nose, mouth, or both.
- the “conduit” refers to a tube that connects the user interface (e g., a CPAP mask) to the respiratory therapy device (e.g., the flow generator) and provides airflow to the user, from the flow generator, via the interface.
- the measurement component 135 may determine the length of the conduit path along the user’s face (e.g., along the path where the conduit is designed to sit, such as from the nose and/or mouth and up over each ear).
- the measurement component 135 may construct or identify a number of points, on the face and/or head of the user (as reflected by the mesh 130), where the conduit should lie when in use. For example, the measurement component 135 may identify a point under the nose (e.g., in the middle of the user’s philtrum), one or more points on each cheek (e.g., at a defined location, such as relative to the nostril or another facial feature, such as the temporal process of the zygomatic bone), a point between the ear and the eye (e.g., at the midline between the user’s left ear and left eye, as well as between the user’s right ear and right eye), a point on the top (e g., uppermost point) of the user’s head, and the like.
- a point under the nose e.g., in the middle of the user’s philtrum
- one or more points on each cheek e.g., at a defined location, such as relative to the nostril or another
- the measurement component 135 may similarly identify or place intermediate points on the surface of the mesh in between the above- referenced landmark points. The measurement component 135 may then connect the point(s) with a spline (lying on the surface of the mesh), and measure the length of the spline. In some embodiments, the length of this spline may then be used to determine or infer the appropriate conduit size, as discussed below.
- the measurement component 135 may construct or identify a number of points, on the face and/or head of the user (as reflected by the mesh 130), in a similar manner to the above-discussed conduit spline (where the particular points may differ). For example, the measurement component 135 may identify points located in locations where the various straps or other components of the headgear will fit on the user’s head. The measurement component 135 may then construct a spline connecting these points, as discussed above. In some aspects, the measurement component 135 may similarly generate multiple such splines along the mesh surface (e.g., for each portion or strap of the headgear), measuring the length of each headgear spline. In some embodiments, the length of these headgear spline(s) may then be used to determine or infer the appropriate headgear size, as discussed below.
- the measurement component 135 may determine the nostril size of the user based on the mesh 130. For example, the measurement component 135 may characterize the nostril(s) of the user using four parameters defining an ellipse: the major axis and minor axis of the ellipse, the rotation of the nostril/ellipse relative to a fixed orientation (e.g., relative to the plane of the face), and the distance between the nostril/ellipse and the centerline of the mesh 130 (e.g., the centerline of the user’s face).
- the measurement component 135 may use a variety of polygons having any number of sides to define the shape of the nostril.
- the measurements 140 are accessed by a selection component 145.
- the selection component 145 evaluates the measurements 140 to select or generate the interface 150.
- the selection component 145 may evaluate one or more of the measurements 140 using one or more thresholds or mappings to select various components of the interface 150. For example, based on the nose size and/or shape of the user, the selection component 145 may evaluate predefined mappings indicating which interface(s) will fit best or be most comfortable.
- the selection component 145 may determine that a first nasal-only mask may be too small to comfortably fit, that a second nasal-only mask will be too large (e.g., such that air leak occurs), and/or that a third nasal-only mask will fit well and be comfortable. As another example, based on the measurements 140, the selection component 145 may determine that the user should use a particular type or model of full-face masks (e.g., an oronasal mask).
- a particular type or model of full-face masks e.g., an oronasal mask.
- the selection component 145 may evaluate the conduit spline length (discussed above) using a rules-based or threshold-based approach (e.g., selecting a conduit size based on the range into which the spline length falls), and/or may process the spline length using a machine learning model (e.g., a trained classifier) to select the conduit size.
- a rules-based or threshold-based approach e.g., selecting a conduit size based on the range into which the spline length falls
- a machine learning model e.g., a trained classifier
- the selection component 145 may evaluate the length(s) of the headgear spline(s) (discussed above) using a rules-based or threshold-based approach (e.g., selecting a headgear size based on the range into which the spline lengths fall), and/or may process the spline length(s) using a machine learning model (e.g., a trained classifier) to select the headgear size.
- a rules-based or threshold-based approach e.g., selecting a headgear size based on the range into which the spline lengths fall
- a machine learning model e.g., a trained classifier
- the selection component 145 may evaluate some or all of the measurements 140 to select a nasal pillow size for the user.
- Nasal pillows are generally soft inserts that fit partially into the nostrils of the user, providing airflow via the nostrils (whereas a nasal mask fits over the nose, and a full-face mask fits over the nose and mouth).
- the selection component 145 uses a classifier machine learning model to select the pillow size based on the nostril measurements.
- the classifier may process the measurements such as nostril major and minor axes, rotation, and/or distance to centerline to generate a classification indicating which size pillow would fit the user best.
- the classifier may be a relatively small or simple machine learning model.
- the measurement system 110 may train the nostril classifier using labeled exemplars.
- the training data may include nostril measurements (as discussed above) of one or more users, where the label for each training sample indicates the pillow size that the user found most comfortable (or that otherwise led to the best results, such as the minimum air leakage).
- the generated or selected interface 150 may include selections for a variety of interface components, as discussed above.
- the interface 150 may indicate a recommended user interface style or design (e.g., nasal only, full-face, or nasal pillow), a recommended model or size of interface (e.g.. from a set of alternative options), a recommended conduit sizing, a recommended headgear sizing, a recommended pillow size (for nasal pillow masks), and the like.
- the measurement system 110 may delete the user image(s) 105 and/or mesh 130 after processing in order to preserve user privacy. For example, in some embodiments, once the mesh 130 is generated, the measurement system 110 may delete the images 105 and image data 120. Further, once the measurements 140 are generated, the measurement system 110 may delete the mesh 130. Additionally, in some embodiments, once the interface 150 is generated, the measurement system 110 may delete the measurements 140.
- the measurement system 110 can provide the selected interface 150 to the user depicted in the images 105.
- the interface 150 is indicated to another user, such as a healthcare provider of the depicted user, who can facilitate ordering and/or delivery of the indicated equipment.
- the measurement system 110 can use machine learning to generate accurate three-dimensional meshes, and then evaluate these meshes to select or recommend equipment for respiratory therapy in a highly granular way. This can improve the results achieved by users (e.g., improving the progress of the therapy) while reducing or eliminating negative outcomes (e.g., discomfort due to poorly fitted masks, substantial air leak, and the like).
- FIG. 2 depicts an example workflow 200 to facilitate image data collection for improved mesh generation, according to some embodiments of the present disclosure.
- the workflow 200 may be performed by a measurement system, such as the measurement system 110 of FIG. 1.
- the image component 115 of the measurement system comprises an evaluation component 205 and a preprocessing component 210. Although depicted as discrete components for conceptual clarity, in some embodiments, the operations of the depicted components (and others not illustrated) may be combined or distributed across any number of components.
- the image component 115 provides one or more instructions 215 to a user device 220.
- the instructions 215 may generally include any information to facilitate collection of user images.
- the instructions 215 may include textual instructions, pictorial instructions, video instructions, audio instructions, and the like.
- the instructions 215 may indicate how the user should position themselves (e.g., by superimposing an ellipse or a human head or face over a live feed from the camera of the user device 220). For example, the instructions 215 may instruct the user to look at the camera, to turn their head to either side of the camera, to look up, and the like. As one example, in some embodiments, the instructions 215 may include depicting or superimposing a box and/or two ellipses (one for each nostril), over the image(s) from the device’s camera, and text instructing the user to position their nostrils within the box and/or in the ellipses, and to capture the image when their nostrils are arranged appropriately. Such instructions 215 may enable improved image capture.
- the instructions 215 may include requesting that the user perform a breathing exercise to determine how well the user can breathe through their nose.
- the instructions 215 may cause the user device 220 to output an animation via a display, and ask the user to breathe (through their nose) in synchronization with the animation.
- the particular contents of the animation may vary depending on the particular implementation.
- the animation may include one or more circles (or other shapes) expanding and contracting, asking the user to inhale as the shape(s) expand, and exhale as the shape(s) contract. After one or more such breathing cycles, the instructions 215 may ask the user to indicate whether they were able to breathe comfortably during the exercise (or to rate their level of comfort).
- the instructions 215 may be provided using any number and variety of communication links.
- the instructions 215 may be transmitted via one or more wired and/or wireless networks, (including the Internet) to the user device 220.
- the user device 220 is generally representative of any computing device that a user may use to capture and/or provide image(s) 105 to the image component 115.
- the user device 220 may correspond to a laptop computer, a desktop computer, a smartphone, a tablet, and the like.
- the user device 220 comprises one or more imaging sensors (e.g., cameras) integrated into the device or available as an external device (e.g., a plugin webcam).
- the image component 115 may provide one or more questions or surveys via the user device 220 to help guide the interface selection process. For example, in some embodiments, the user may be asked whether they have used any other interfaces within a defined period of time (e.g., the last thirty days), and if so, the user may be asked to provide further information such as the model or type of the prior interface(s), the model or type of their current interface, and/or a reason for why they switched (e.g., because they could not get a good seal, because the prior interface was uncomfortable, because they had facial markings or irritation, because they felt claustrophobic with the old interface, because air was leaking and/or they were mouth breathing, because the mask would not stay in place, and the like).
- a defined period of time e.g., the last thirty days
- the system may similarly ask the user to indicate whether they initiated the switch (as compared to, for example, their healthcare provider suggesting a switch). Such information may be useful to suggest a new interface for the user (e.g., to select full face, pillow, or nasal mask based on their responses and/or prior interface usage). For example, if the user indicated feelings of claustrophobia while using a full face mask, the measurement system 110 may suggest a nasal or pillow interface.
- the image component 115 may provide questions related to whether the user breathes through their mouth or otherwise has difficulty breathing through their nose. For example, the user may be asked whether they experience a variety of common concerns (e.g., dry mouth, nasal congestion or irritation, and the like).
- common concerns e.g., dry mouth, nasal congestion or irritation, and the like.
- the system may ask the user if they have noticed any air leak from their current interface (if they are already participating in therapy), whether they breathe through their mouth when using the therapy, whether they find themselves breathing through their mouth when exerting themselves (e.g., when walking up stairs), whether the user, when asked to take a deep breath, finds it easier to breathe through their mouth or their nose, whether the user has any medical conditions that make breathing through the nose difficult (such as the common cold, chronic sinusitis, chronic allergies, deviated septum, and the like).
- such questions may be useful to allow the measurement system to select a good interface recommendation, as discussed above.
- the measurement system may further recommend specific types based on the user responses (e g., suggesting a full face mask for users who have difficulty breathing through their nose or who otherwise tend to breathe through their mouth).
- the user device 220 transmits one or more image(s) 105 to the image component 115.
- the preprocessing component 210 may first perform one or more preprocessing operations on the images 105. For example, as discussed above, the preprocessing component 210 may resize the images 105 to a standard or default size, and/or perform a variety of operations such as contrast enhancement and noise reduction to improve the machine learning process.
- the evaluation component 205 may evaluate the images 105 (or the preprocessed image data generated by the preprocessing component 210) to determine whether the images 105 are acceptable. For example, the evaluation component 205 may use various machine learning models to detect whether the user’s face is depicted in the image(s) 105, whether there is sufficient lighting, and the like. In some embodiments, the evaluation component 205 uses a machine learning model trained to identify or detect whether ear(s) are depicted in an image 105. For example, the evaluation component 205 may process the image(s) 105 corresponding to when the user turned left and/or right in order to determine whether the user’s ear(s) are visible.
- Such landmarks may be useful to improve the mesh generation, as it may enable more accurate shape and sizing of the model head, which can improve headgear sizing.
- the image component 115 can send a new set of instructions 215 to the user asking them to try again (e.g., to take the profile picture again, but move their hair back and out of the way).
- this process may be repeated any number of times until an acceptable set of images 105 is obtained.
- the ear detection machine learning model may be a lightweight classifier that can be executed by the user device 220 to detect the ear visibility locally, allowing the user to immediately capture another image if needed. This may reduce network bandwidth consumed by the process (e.g., reducing the number of images 105 transmitted across the network) as well as reducing computational expense on the measurement system.
- the image component 115 may provide the images 105 (or image data generated therefrom, such as feature maps) to one or more other components of the measurement system (e.g., the mesh component 125 of FIG. 1), as discussed above.
- FIG. 3 depicts an example workflow 300 for improved interface selection based on generated meshes, according to some embodiments of the present disclosure.
- the workflow 300 may be performed by a measurement system, such as the measurement system 110 of FIG. 1.
- the selection component 145 of the measurement system comprises a mapping component 305 and a classifier component 310. Although depicted as discrete components for conceptual clarity, in some embodiments, the operations of the depicted components (and others not illustrated) may be combined or distributed across any number of components.
- the selection component 145 accesses the set of measurements 140 (e.g., facial measurements generated by a measurement component, such as the measurement component 135 of FIG. 1, based on a three-dimensional facial mesh depicting a user, such as the mesh 130 of FIG. 1).
- the measurements 140 generally include one or more measurements indicating the size, shape, and/or positioning of one or more facial landmarks in three-dimensional space, such as the relative size, shape, and/or position of the user’s eyes, nose, mouth, ears, nostrils, and the like.
- the selection component 145 also accesses a set of mappings 315.
- the mappings 315 generally indicate prior sizing of one or more components of user interfaces. That is, the mappings 315 may indicate, for one or more components (e.g., conduit sizes, interface types or models, headgear sizes, and the like) a range of measurements for which the component was designed and/or which the component will fit appropriately.
- the mappings 315 may indicate that a first nasal-only mask is best for users having a first set of nose measurements, while a second nasal-only mask is better for users having a second set of nose measurements.
- the mappings 315 may be defined or provided by the designers or manufacturers of the therapy components, and/or may be determined based on user interactions (e.g., surveying users to determine which component(s) they prefer).
- the mapping component 305 may evaluate some or all of the measurements 140 using the mappings 315 to select appropriate interface components. For example, as discussed above, the mapping component 305 may select one or more alternatives that align with the measurements 140, such as one or more interfaces, one or more conduits, one or more headgear sizes, and the like.
- the classifier component 310 may similarly evaluate some or all of the measurements to select appropriate interface components.
- the classifier component 310 may process the measurement data using one or more machine learning models in order to select the components.
- the classifier component 310 may process nostril measurements (e.g., the major and minor axes of an ellipse corresponding to the nostril, the rotation of the nostril, and/or the distance between the nostril and the center of the nose) using a classifier model to select a pillow size for the user.
- the selection component 145 generates an interface 150 based on the measurements 140. Although a single interface 150 (e.g., a single set of components) is depicted for conceptual clarity, in some embodiments, the selection component 145 may generate a set of alternatives.
- the selection component 145 may generate a first interface 150 for a nasal-only style (e.g., recommending a particular interface, conduit, and headgear if the user wants to use a nasal mask), a second interface 150 for a full-face style (e.g., recommending a particular interface, conduit, and headgear if the user wants to use a full-face mask), and/or a third interface 150 for a pillow style (e.g., recommending a particular interface, conduit, headgear, and pillow size if the user wants to use a pillow mask).
- a nasal-only style e.g., recommending a particular interface, conduit, and headgear if the user wants to use a nasal mask
- a full-face style e.g., recommending a particular interface, conduit, and headgear if the user wants to use a full-face mask
- a pillow style e.g., recommending a particular interface, conduit, headgear, and pillow size if the user wants to
- the selection component 145 may indicate alternatives within the same style or type of mask. For example, suppose the mappings 315 include overlapping ranges of measurements for one or more components. In some embodiments, if the user’s measurements 140 lie in the overlapping region(s), the interface 150 may indicate that any of the alternatives may be suitable.
- the selection component 145 can provide substantially improved interface selection for users, resulting in improved therapy outcomes and comfort.
- FIG. 4 is a flow diagram depicting a method 400 for using machine learning model(s) to generate three-dimensional meshes and select interface components, according to some embodiments of the present disclosure.
- the method 400 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-3.
- the measurement system accesses a set of image(s) (e.g., the images 105 of FIG. 1 and/or FIG. 2).
- the images generally depict the head and/or face of a user.
- the images are provided in order to enable the measurement system to generate a three-dimensional mesh corresponding to the user’s face, allowing the measurement system (or another system) to capture highly accurate measurements relating to the size, shape, positioning, and/or orientation of various facial features or landmarks, such as the nose, nostrils, eyes, mouth, ears, and the like.
- accessing the images includes evaluating the images to confirm that they meet defined acceptance criteria, such as a minimum size, a minimum resolution, a minimum amount of lighting and/or contrast, and the like.
- defined acceptance criteria such as a minimum size, a minimum resolution, a minimum amount of lighting and/or contrast, and the like.
- the measurement system generates a mesh (e.g., the mesh 130 of FIG. 1) based on processing the accessed image(s) (or image data generated therefrom) using one or more machine learning models (e.g., a deep learning model). For example, as discussed above, the measurement system (or another system) may train a machine learning model using training samples, each sample comprising one or more images (depicting a respective user) as the input and a corresponding set of three-dimensional data points for a set of landmarks on the user’s face (or a mesh of the respective user’s face) used as the target or label. In some embodiments, as discussed above, some or all of the training samples may comprise synthetic data (e.g., synthetic or artificial face meshes used as the label, with rendered images of the meshes used as the input).
- synthetic data e.g., synthetic or artificial face meshes used as the label, with rendered images of the meshes used as the input.
- the particular operations used to train the machine learning model may vary depending on the particular architecture.
- the measurement system may process the image(s) of a training sample as input to them model (e.g., a deep learning convolutional neural network) to generate a mesh.
- the mesh may then be compared against the label (e.g., the actual mesh or other data points in three-dimensional space) to generate a loss.
- the loss may generally use a variety of formulations, such as surface-to-surface loss, point-to-point loss, surface normal loss, Laplacian regularization loss, and the like.
- the parameters of the model may then be updated (e g., using backpropagation) based on the loss.
- the model learns to generate more accurate output meshes based on input images.
- the model may be trained using individual samples (e.g., using stochastic gradient descent) and/or batches of samples (e.g., using batch gradient descent) over any number of iterations and/or epochs.
- the model can be used for runtime mesh generation based on input user images.
- the mesh generated by the machine learning model is already scaled to the size of the user’s head and/or facial features. That is, because the model may be trained end-to-end using relatively dense labels (e.g., dense point clouds and/or meshes), the model may inherently learn to predict the scale of the face, without separately predicting how the camera affects the perceived size (e.g., based on FOV and/or distance). For example, in some embodiments, the machine learning model learns to respect or recreate the identity (e.g., facial shape) of the person regardless of any variations in angle, background, FOV of the camera, distance of the camera, and the like (in a similar manner to how some facial recognition models work). In this way, the generated mesh may be inherently scaled correctly by the mode.
- relatively dense labels e.g., dense point clouds and/or meshes
- the measurement system may then scale the output mesh using such a camera model.
- the measurement system may use the camera model and/or the FOV of the camera (if known) to predict the appropriate size for the mesh (or features therein). For example, objects further from the camera are perceived as smaller, relative to objects nearer to the camera. Therefore, the measurement system may evaluate the change(s) in perceived size of one or more facial landmarks (e.g., the user’s ears, nose, mouth, and the like) across the images in order to predict the FOV of the camera, the distance to the landmark(s), and/or the actual size of the feature(s). This allows the measurement system to scale the mesh accurately.
- facial landmarks e.g., the user’s ears, nose, mouth, and the like
- use of a camera model may refer to using a perspective projection technique that projects the mesh from world space to camera space. If the parameters of the camera (e.g., FOV) are known, one or more projected keypoints can be compared with the ground truth keypoint locations (on the image) to determine the appropriate scaling of the mesh.
- FOV field truth keypoint locations
- the measurement system removes facial expression(s) present in the mesh, if present.
- the measurement system may deform the mesh to place the face in a neutral position (e.g., to remove expressions such as smiling, an open mouth, raised eyebrows, and the like).
- a statistical model comprising a shape kernel (e.g., indicating a statistically average head shape) and one or more expression kernels (e g , indicating various facial expressions) may be used.
- the kemel(s) generally correspond to statistical models generated using principal component analysis (PCA) on facial datasets.
- PCA principal component analysis
- the kernel may be generated by performing PCA on a dataset of neutral expressions.
- the kemel(s) may be generated by similarly performing PCA on datasets of various expression(s).
- the expression kemel(s) may be used to remove any facial expressions present in the mesh (to cause the mesh to depict or correspond to a neutral expression).
- removing facial expressions may result in improved measurement accuracy, as compared to taking measurements from a mesh depicting one or more facial expressions.
- the measurement system may avoid the need to request additional images from the user (e.g., asking the user to take another picture without smiling). This can improve user experience and reduce the time consumed by the measurement process.
- the measurement system generates one or more facial measurements (e.g., the measurements 140 of FIGS. 1 and/or 3) based on the mesh.
- the measurement system may generate measurements reflecting features such as the size and shape of the nose, the size and shape of the mouth, the positioning of the mouth relative to the nose, the size, shape, and positioning of the nostrils, the length of the conduit path or other circumferential measure around the user’s head (e.g., for the conduit sizing and/or headgear sizing), and the like.
- the measurement system selects one or more interface components, for the user, based on the facial measurements. For example, as discussed above, the measurement system may select one or more alternatives for each category of component (e.g., one or more conduits, one or more mask types and/or models, and the like) that are well-suited for the user, based on the determined measurements. Although not depicted in the illustrated example, in some embodiments, the measurement system may further select the interface component(s) based at least in part on user responses to survey questions, as discussed above. For example, if the user reports feelings of claustrophobia, the measurement system may select a nasal or pillow style interface.
- the measurement system may select a nasal or pillow style interface.
- the measurement system may select a full-face style interface.
- the measurement system may ask the user to engage in a nose breathing exercise (e.g., synchronizing their breathing with an animation), and then ask the user to report how well they could breathe (through their nose) during the exercise.
- the mask style may be selected based (at least in part) on the user response to this exercise.
- the measurement system is able to use machine learning to generate highly accurate three-dimensional meshes based on two- dimensional images, and then collect highly granular facial measurements based on the meshes. This can substantially improve the accuracy of the measurements, resulting in improved reliability in selecting appropriate interface components. As discussed above, these improved selections then enable improved respiratory therapy, such as through increased comfort (which may result in increased uptake or usage of the therapy), decreased air leak or other negative concerns, reduced difficulty or hassle in determining which equipment to select (which may increase the number of patients who decide to start therapy, as the barrier to entry is reduced), and the like. This can substantially improve results for a wide variety of users of respiratory therapy.
- FIG. 5 is a flow diagram depicting a method 500 for collecting and evaluating image data, according to some embodiments of the present disclosure.
- the method 500 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-4.
- some or all of the method 500 may be performed by other systems, such as locally by the user device (e.g., the user device 220 of FIG. 2) used to capture the images.
- the method 500 provides additional detail for block 405 of
- the measurement system provides user instructions (e.g., the instructions 215 of FIG. 2) to the user.
- the instructions may take a number of forms, such as text instructions, pictorial instructions, video instructions, audio instructions, and the like.
- the instructions generally indicate what image(s) are needed, such as by illustrating example images that would be acceptable, instructing how the user should angle their head relative to the camera, and the like.
- the measurement system may provide instructions for each image independently. For example, the measurement system may provide instructions for a first image in the desired set, capture the image, and then provide instructions for the next image. In other embodiments, the measurement system may provide instructions for all images at once.
- the measurement system may similarly provide one or more questions or surveys to the user (e.g., to infer or determine whether they tend to breathe through their mouth, or to identify any prior interfaces that the user has stopped using). Such information may be useful in providing improved interface selection, as discussed above.
- the measurement system receives one or more user images (e.g., the images 105 of FIGS. 1-2) from the user device.
- the measurement system receives one or more individual images (e.g., the user device may capture one or more images, such as when the user indicates that they are ready).
- the measurement system receives a video segment (e.g., a stream or sequence of frames).
- the user device may record a video of the user following the instructions, allowing the measurement system to select the best image(s).
- the measurement system evaluates the received user image(s) to determine whether the image(s) satisfy one or more defined acceptance criteria.
- the particular criteria used may vary depending on the particular implementation. For example, in some embodiments, the measurement system may determine whether the image(s) are sufficiently high resolution, have sufficient contrast or clarity, have appropriate lighting, and the like.
- the measurement system may evaluate the image(s) to confirm whether the user followed the instructions appropriately.
- the measurement system may process the image(s) using one or more computer vision models trained to identify the presence of one or more landmarks or features, such as ear(s), eye(s), the mouth, the nose, and the like.
- the measurement system may use an ear detection model to confirm whether the user’s ear is visible.
- the particular operations used to train the machine learning models may vary depending on the particular architecture.
- the measurement system (or other training system) may process an image of a training sample as input to them model to generate a binary output (or a set of binary outputs) indicating whether one or more landmarks or features are present.
- the output may then be compared against the label (e g , whether each landmark is, in fact, present) to generate a loss
- the loss may generally use a variety of formulations, such as cross-entropy loss, depending on the particular implementation.
- the parameters of the detection model may then be updated (e g , using backpropagation) based on the loss In this way, the model learns to predict whether one or more landmarks or facial features are present in provided images.
- the model may be trained using individual samples (e.g., using stochastic gradient descent) and/or batches of samples (e g., using batch gradient descent) over any number of iterations and/or epochs.
- the measurement system determines (based on the evaluation) whether the criteria are satisfied. If not, the method 500 returns to block 505. In some embodiments, the measurement system provides additional instruction at block 505 based on the particular criteria that were not met. For example, the measurement system may specifically indicate that the lighting was poor, that the user was too far from the camera, that the angle was wrong, that the user should ensure their ear is visible, and the like.
- the method 500 continues to block 525, where the measurement system determines whether there are one or more additional images, in the desired set of images, which have not yet been provided. If so, the method 500 returns to block 505. If not, the method 500 continues to block 530.
- the measurement system may receive and/or evaluate some or all of the images in parallel. For example, in some embodiments, the measurement system receives a video of the user moving their head to each designated position (e.g., forward, left, right, up, and down), and may extract appropriate images from this video sequence.
- the measurement system optionally applies one or more preprocessing operations to the images.
- the preprocessing operation(s) may generally include any operations to facilitate or improve the machine learning process.
- the measurement system may adjust the contrast and/or brightness of the images, resize the images, crop the images, and the like.
- the measurement system may extract one or more features from the images (e.g., processing the image with a feature extraction machine learning model to generate one or more feature maps), as discussed above.
- FIGS. 6A and 6B illustrate a flow diagram depicting a method 600 for collecting user image data, according to some embodiments of the present disclosure.
- FIG. 6A depicts a method 600A
- FIG. 6B depicts a method 600B, where the methods 600A and 600B collectively form a method 600.
- the method 600 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-5.
- the method 800 provides additional detail for block 405 of FIG. 4 and/or for the method 500 of FIG. 5
- the measurement system outputs an instruction indicating that the user’s head should be within a defined frame.
- the measurement system may output this instruction using a variety of techniques, such as via a graphical user interface (GUI), via audio, and the like.
- GUI graphical user interface
- the measurement system may output textual instructions via the GUI, natural language speech audio indicating the instructions, and the like.
- the “frame” refers to a defined portion or region of the visual field of an imaging sensor being used to capture the user images (e.g., a smartphone camera).
- the measurement system may output real-time (or near-real-time) images captured by the imaging sensor via a display.
- the measurement system may output the captured image(s) via a GUI of the user’s smartphone, instructing the user to position themselves and/or the camera such that their head is located within the frame indicated on the GUI.
- the measurement system may perform various preprocessing operations, such as to evaluate one or more captured image(s) using machine learning to determine whether a single face is detected anywhere in the images (e.g., rather than multiple faces or no faces).
- the method 600 is performed while continuously or periodically capturing image(s) for evaluation using the various criteria discussed in more detail below.
- the measurement system determines whether the user’s head is in the desired frame in the captured image(s).
- the presence of the users head in a desired frame or location in the image(s) may be referred to as “frame criteria.” If not, the method 600 returns to block 602 to remind the user to place their head in the frame. If the user’s head is in the frame, the method 600 continues to block 606, where the measurement system determines whether one or more brightness criteria are satisfied by the captured or stream of images. For example, the measurement system may determine whether the image(s) are too light or bright (e.g., washed out by direct sunlight) and/or too dark or dim (e.g., obscured by shadows or a dark room). In some aspects, the desired level of brightness may be referred to as “brightness criteria.”
- the method 600 continues to block 608, where the measurement system indicates the brightness criteria.
- the measurement system may output an indication that the images are too dark or too bright (as relevant), and suggest or instruct the user move to an area with better lighting, avoid shadows, avoid bright direct light, and the like.
- the method 600 then returns to block 606.
- the method 600 continues to block 610, where the measurement system determines whether the head of the user is depicted level in the image(s), relative to the imaging sensor. For example, the measurement system may determine whether the imaging sensor is being held level (or within a defined angle from level) with the user’s face (such that the images depict the head from straight on), below the user’s face (such that the images depict the head from below), and the like. In some aspects, this may be referred to as “level criteria.”
- the method 600 determines that the user’s head is not level with the camera, the method 600 continues to block 612, where the measurement system indicates the level criteria. For example, the measurement system may output an indication that the user should position themselves and/or the imaging sensor such that the camera is level with the user’s face. The method 600 then returns to block 608.
- the method 600 continues to block 614, where the measurement system determines whether the head of the user is within a defined distance, relative to the imaging sensor. For example, the measurement system may determine whether the estimated distance between the head and the imaging sensor is above a minimum threshold and/or below a maximum threshold (e.g., based on the relative size of the user’s head in the captured images). In some aspects, this may be referred to as “distance criteria.”
- the method 600 determines that the user’s head is too far or too close to the camera, the method 600 continues to block 616, where the measurement system indicates the distance criteria. For example, the measurement system may output an indication that the user should move themselves and/or the imaging sensor closer together or further apart such that the user’s face fills the indicated frame in the images. The method 600 then returns to block 614.
- the method 600 continues to block 618, where the measurement system captures one or more frontal images of the user’s face.
- “capturing” the frontal images at block 618 may correspond to saving, selecting, storing, or otherwise retaining or flagging one or more images (e.g., the images captured just before and/or after determining that block 614 was satisfied) for subsequent evaluation (e g., to predict facial measurements and/or select a user interface).
- the measurement system may output an indication that the frontal image(s) have been captured (e.g., via a camera shutter noise, a flash of white or green light on the screen, and the like). In some aspects, the measurement system may refrain from providing such indication (e.g., proceeding directly to block 620, such that the user may not be aware of which image(s) are evaluated for the measurement process).
- the measurement system instructs the user to turn their heads towards the side (e.g., the right side or the left side, relative to the camera). For example, the measurement system may instruct the user to turn their head slowly to the left or right.
- the measurement system determines whether at least one of the user’s ears are visible in the most recently captured image (e.g., whether the full outline of the ear is visible, or whether the ear is obscured entirely or partially by hair, a hat, etc.). In some aspects, this may be referred to as “ear criteria.”
- the method 600 determines that the ear criteria are not satisfied, the method 600 continues to block 624, where the measurement system indicates the ear criteria. For example, the measurement system may output an indication that the user’s ear(s) should be visible (e.g., the ear on the opposite side of the head to which the user is turning), and instruct or ask the user to remove any impediments. The method 600 then returns to block 620.
- the method 600 continues via the block 626 to block 628, depicted in FIG. 6B.
- the measurement system determines whether the user’s head is turned to at least a threshold angle relative to the imaging sensor (as depicted in the recently captured images). In some aspects, this is referred to as a “threshold angle criteria.” If the measurement system determines that the angle criteria are not met (e.g., the user’s head is not turned far enough to the side, or is turned too far to the side), the method 600 continues to block 630.
- the measurement system indicates the angle threshold. For example, the measurement system may output an indication that the user should turn their head further (e.g., away from the camera) or less (e.g., back towards the camera). The method 600 then returns to block 628.
- the method 600 continues to block 632, where the measurement system captures one or more side images of the user’s face.
- the measurement system may correspond to saving, selecting, storing, or otherwise retaining or flagging one or more images (e.g., the images captured just before and/or after determining that block 628 was satisfied) for subsequent evaluation (e.g., to predict facial measurements and/or select a user interface).
- the measurement system may output an indication that the side image(s) have been captured (e g., via a camera shutter noise, a flash of white or green light on the screen, an audio statement indicating that the user turned their head far enough, and the like). In some aspects, the measurement system may refrain from providing such indication (e.g., proceeding directly to block 634, such that the user may not be aware of which image(s) are evaluated for the measurement process).
- the measurement system determines whether there is at least one additional side or angle for which the measurement system needs to capture image(s) for analysis. For example, if the user turned their head to the left, the measurement system may determine that the system still needs to capture image(s) of the user turning their head to the right (and vice versa). If the measurement system has not yet captured image(s) from the other side of the users head, the method 600 returns, via the block 636, to block 620 of FIG. 6A. In some aspects, if the measurement system relies on image(s) from a single side (e g., the left or the right) rather than both sides, block 634 may be bypassed.
- a single side e g., the left or the right
- the measurement system determines that the measurement system has captured images from both side(s) of the user’s head (or from one side, if only one side is needed), the method 600 continues to block 638.
- the measurement system instructs the user to tilt their head (e.g., upwards, relative to the camera).
- the measurement system determines whether the user’ s head is tilted to at least a threshold angle relative to the imaging sensor (as depicted in the recently captured images). In some aspects, this is referred to as a “tilt criteria” or simply as a “threshold angle criteria,” as discussed above. If the measurement system determines that the angle or tilt criteria are not met (e.g., the user’s head is not turned or tilted far enough up, or is turned or tilted too far upwards), the method 600 continues to block 642.
- the measurement system indicates the angle or tilt threshold.
- the measurement system may output an indication that the user should turn their head further up (e.g., away from the camera) or not as far up (e.g., nearer to the camera).
- the method 600 then returns to block 638.
- the method 600 continues to block 644, where the measurement system captures one or more tilt images of the user’s face.
- “capturing” the tilt images at block 644 may correspond to saving, selecting, storing, or otherwise retaining or flagging one or more images (e.g., the images captured just before and/or after determining that block 640 was satisfied) for subsequent evaluation (e.g., to predict facial measurements and/or select a user interface).
- each instruction or indication provided during the method 600 may generally be provided in a variety of modalities, including via textual output on the display, audio (e.g., spoken) output via one or more speakers, haptic output, and the like.
- the illustrated example depicts a sequence of evaluations, in some aspects, some or all of the evaluations may be performed in partially or entirely in parallel.
- the measurement system continuously or periodically captures images while the method 600 is performed, such that each evaluation is performed on one or more of the most recently captured images.
- capturing or recording images for further evaluation may correspond to saving the image(s) for future use, while the remaining images captured during the method 600 may be discarded.
- determining whether or not one or more of the indicated criteria are satisfied may include evaluating the criteria for a sequence of images (e.g., at least five images in a row), over a period of time (e.g., whether the criteria are satisfied after five seconds), and the like. In some aspects, determining that the criteria are satisfied may be performed based on a single image (e.g., allowing the method 600 to proceed quickly to the next step) while determining that the criteria are not satisfied may be performed based on a set of multiple images and/or a minimum amount of time without the criteria being satisfied.
- the measurement system may “capture” multiple images at each position (e.g., a sequence of images from the frontal view, from each side, and from the tilt angle). In some aspects, the measurement system may then select the best image(s) for downstream processing. For example, from each pool or set of images (e.g., for each angle), the measurement system may select the image having the least blur, the best lighting, and the like.
- FIG. 7 depicts an example workflow 700 for collecting user image data, according to some embodiments of the present disclosure.
- the workflow 700 may be used by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-6.
- the workflow 700 is used to facilitate capture of user images, as discussed above.
- the workflow 700 is implemented by displaying user interfaces 705 (e.g., GUIs) via a display, such as on a smartphone, a laptop computer, a desktop computer, and the like.
- the workflow 700 is used to guide users to turn or tilt their heads to desired angles, as discussed above.
- the user interface 705A may include a frame 710 and one or more directional indicators 715.
- the frame 710 may generally correspond to any visual indication of where the user should positon their head in the captured images.
- the measurement system may continuously capture images (e.g., a video or sequence of frames) and display the images on the user interface 705A, where the frame 710 may be superimposed over the stream and/or may occlude some or all of the images.
- the background of the user interface 705A may be opaque, and the frame 710 may be a window (e.g., an ellipse or other shape) showing a portion of the captured images. The user may be instructed to align their heads with the frame 710.
- the directional indicators 715 may correspond to any visual indication of which direction the user should turn or tilt their head.
- the illustrated user interface 705A depicts the directional indicators 715 pointing from the frame 710 to the left (e.g., asking the user to turn their head to the left).
- the directional indicators 715 may be depicted pointing from the frame 710 to the right (e.g., asking the user to turn their head to the right) and/or pointing from the frame 710 upwards (e g., asking the user to tilt their head up).
- the directional indicators 715 may generally take any form (including non-visual indicators, such as audio recordings indicating the direction).
- a threshold indicator 720 is depicted.
- the threshold indicator 720 may be used to visually indicate how far the user should turn their head in the indicated direction.
- a portion 725 of the directional indicators 715 is shaded (indicated by stippling) to indicate the amount that the user has turned their head, relative to the camera. That is, the measurement system may monitor the angle that the user’s head is facing relative to the camera, and may update the portion 725 to visually indicate the angle.
- the portion 725 may start at a first size (e.g., small or invisible) when the user is facing directly towards the camera. As the user’s head angle increases relative to the camera, the measurement system may expand the portion 725 to indicate the amount that the user’s head has turned relative to the frontal angle.
- a first size e.g., small or invisible
- the user has turned their head somewhat to the left, but has not yet turned their head far enough to satisfy the angle criteria (indicated by the threshold indicator 720).
- the interface is updated in real-time (or near- real-time) based on the determined angle of the user’s head.
- the measurement system may dynamically change the size of the portion 725 to reflect the current angle.
- the portions 725 may be implemented using any suitable technique.
- the measurement system may cause the portion 725 to be a different color than the directional indicators 715 (e.g., where the indicators are white and the portion 725 fills in a color such as green).
- the measurement system may output other indications such as audio indications (e g., giving the current estimated angle, instructing the user to turn a bit further, and the like).
- the user has turned their head beyond the minimum threshold (indicated by the threshold indicator 720), as indicated by the shaded portion 725 of the directional indicators 715.
- the measurement system may capture one or more image(s) at this angle for subsequent evaluation.
- the background of the user interface 705C has been updated to reflect or indicate that the angle criteria have been satisfied.
- the measurement system may cause the background to change color (e.g., to green), to flash one or more colors or graphics, and the like.
- Such indications may help the user recognize that the image(s) have been captured and they can turn back toward the camera.
- the measurement system may use other indications such as audio output (e.g., a camera shutter sound effect or an oral statement that the images are captured) to assist the user.
- FIG. 8 is a flow diagram depicting a method 800 for improved interface selection, according to some embodiments of the present disclosure.
- the method 800 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-7.
- the method 800 provides additional detail for block 425 of FIG. 4.
- the measurement system selects a pillow size for the user by processing one or more nostril measurements using a trained machine learning model.
- the measurement system (or another system) may train a classifier model to classify nostril measurements into pillow sizes.
- the nostril measurements may include parameters such as the major and minor axes of an ellipse that corresponds to the nostril, the rotation of the ellipse or nostril, the distance between the ellipse or nostril and the centerline of the user’s nose, and the like.
- Using such user-specific measurements and machine learning can result in a pillow fitting that is far more comfortable and accurate (as well as far easier and more sanitary, as compared to a guess- and-check approach).
- the particular operations used to train the classifier machine learning model may vary depending on the particular architecture.
- the measurement system (or other training system) may process the nostril measurements of a training sample as input to them model to generate a classification (e.g., to select a pillow size).
- the classification may then be compared against the label (e.g., the actual pillow size appropriate and/or comfortable for the user, based on their nostrils) to generate a loss.
- the loss may generally use a variety of formulations, such as cross-entropy loss.
- the parameters of the model may then be updated (e.g., using backpropagation) based on the loss.
- the model learns to generate more accurate pillow size classifications based on input measurements.
- the model may be trained using individual samples (e.g., using stochastic gradient descent) and/or batches of samples (e.g., using batch gradient descent) over any number of iterations and/or epochs.
- the measurement system selects a conduit size based on one or more head measurements (e.g., measurements of the length of the conduit path from the user’s nose and/or mouth, up across the cheekbones, and over the user’s ears).
- the measurement system may select a dynamic conduit size based on the measurement (e.g., selecting conduit that is the same length as the conduit path), or may select one of a predefined set of alternative conduit sizes (e.g., using a defined mapping between facial measurements and conduit size, such as the mapping 315 of FIG. 3).
- a dynamic conduit size based on the measurement (e.g., selecting conduit that is the same length as the conduit path), or may select one of a predefined set of alternative conduit sizes (e.g., using a defined mapping between facial measurements and conduit size, such as the mapping 315 of FIG. 3).
- the measurement system selects a headgear size based on one or more head measurements (e.g., measurements of the occipitofrontal circumference of the user’s head).
- head measurements e.g., measurements of the occipitofrontal circumference of the user’s head.
- the measurement system may fit a statistical shape model of a human head to the mesh, such that the circumference of the head can be estimated (even if the back of the user’s head is not imaged).
- the measurement system may select a dynamic headgear size based on the measurement (e g , indicating to use headgear that is the same size as the head circumference), or may select one of a predefined set of alternative headgear sizes (e.g., using a defined mapping between facial measurements and headgear size, such as the mapping 315 of FIG. 3).
- a dynamic headgear size based on the measurement (e g , indicating to use headgear that is the same size as the head circumference), or may select one of a predefined set of alternative headgear sizes (e.g., using a defined mapping between facial measurements and headgear size, such as the mapping 315 of FIG. 3).
- the measurement system selects a user interface (e.g., a nasal mask, a full-face mask, and/or a nasal pillow mask) based on one or more head or facial measurements (e.g., measurements of the nose and/or mouth of the user). For example, the measurement system may select one of a predefined set of alternative interfaces (e g., using a defined mapping between facial measurements and interfaces, such as the mapping 315 of FIG. 3). Using these user-specific measurements can result in a far improved interface fit, as compared to more generic or less accurate approaches.
- the measurement system may select multiple interface alternatives. For example, the measurement system may select one interface of each type (e.g., one nasal interface, one nasal pillow interface, and one full-face interfaces), or may select multiple alternatives within each type category.
- the particular category (or categories) for which the measurement system generates a selection may depend on user input. For example, the user may specify that they would like a recommended nasal mask.
- the measurement system may generate the selection based on predicted user preference or fit (e.g., based on the facial measurements). For example, the measurement system may determine or infer, based on the facial measurements, that a particular type or category of interface will likely be the most comfortable for the user.
- the measurement system may select the interface component(s) based at least in part on user responses to various questions or surveys. For example, the measurement system may select an interface type based on responses related to the user’s prior interface usage (e g., if they already tried a pillow interface and did not like it, for example), based on the user’s comfort level with various types, based on the user’s tendency to breathe through their mouth or their nose, and the like. As another example, the measurement system may select the interface type based on the user’s response to a breathing exercise (e g., where the user is asked to breathe through their nose in synchronization with an animation), as discussed above. For example, the measurement system may select a full-face interface type for users who have difficulty breathing through their nose, and a nasal and/or pillow type for users who report potential claustrophobia with full face masks.
- a breathing exercise e g., where the user is asked to breathe through their nose in synchronization with an animation
- the particular component(s) selected by the measurement system may vary depending on the particular task and implementation.
- the illustrated examples e.g., a pillow size, a conduit size, a headgear size, and an interface model
- the measurement system may select additional components not pictured, or may select a subset of the illustrated components, for the user.
- FIG. 9 is a flow diagram depicting a method 900 for using machine learning to select user interfaces, according to some embodiments of the present disclosure.
- the method 900 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-8.
- a set of two-dimensional images (e.g., the images 105 of FIGS. 1-2) of a user is accessed.
- a three-dimensional mesh (e.g., the mesh 130 of FIG. 1) depicting a head of the user is generated based on processing the set of two-dimensional images using a first machine learning model, wherein the three-dimensional mesh is scaled to a size of the head of the user.
- the three-dimensional mesh is modified to remove one or more facial expressions.
- a set of facial measurements (e.g., the measurements 140 of FIGS. 1 and/or 3) is determined based on the modified three-dimensional mesh.
- a user interface (e.g., the interface 150 of FIGS. 1 and/or 3) is selected for the user based on the set of facial measurements.
- FIG. 10 depicts an example computing device 1000 configured to perform various embodiments of the present disclosure.
- the computing device 1000 may be implemented using virtual device(s), and/or across a number of devices (e.g., in a cloud environment).
- the computing device 1000 corresponds to a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-9.
- the computing device 1000 includes a CPU 1005, memory 1010, a network interface 1025, and one or more I/O interfaces 1020.
- the CPU 1005 retrieves and executes programming instructions stored in memory 1010, as well as stores and retrieves application data residing in one or more storage repositories (not depicted).
- the CPU 1005 is generally representative of a single CPU and/or GPU, multiple CPUs and/or GPUs, a single CPU and/or GPU having multiple processing cores, and the like.
- the memory 1010 is generally included to be representative of a random access memory.
- the computing device 1000 may include storage (not depicted) which may be any combination of disk drives, flash-based storage devices, and the like, and may include fixed and/or removable storage devices, such as fixed disk drives, removable memory cards, caches, optical storage, network attached storage (NAS), or storage area networks (SAN).
- storage not depicted
- NAS network attached storage
- SAN storage area networks
- I/O devices 1035 are connected via the I/O interface(s) 1020.
- the computing device 1000 can be communicatively coupled with one or more other devices and components (e.g., via a network, which may include the Internet, local network(s), and the like).
- the CPU 1005, memory 1010, network interface(s) 1025, and I/O interface(s) 1020 are communicatively coupled by one or more buses 1030.
- the memory 1010 includes an image component 1050, a mesh component 1055, a measurement component 1060, and a selection component 1065, which may perform one or more embodiments discussed above. Although depicted as discrete components for conceptual clarity, in embodiments, the operations of the depicted components (and others not illustrated) may be combined or distributed across any number of components. Further, although depicted as software residing in memory 1010, in embodiments, the operations of the depicted components (and others not illustrated) may be implemented using hardware, software, or a combination of hardware and software.
- the image component 1050 (which may correspond to the image component 115 of FIGS. 1-2) may be used to access, evaluate, and/or preprocess images (e.g., the images 105 of FIGS. 1-2), as discussed above.
- the image component 1050 may transmit or output instructions indicating how to capture the image(s), preprocess the image(s), and/or evaluate the image(s) to confirm that they meet acceptance criteria (e.g., whether an ear is visible in the profile image(s)).
- the mesh component 1055 (which may correspond to the mesh component 125 of FIG. 1) may be used to generate three-dimensional meshes (e.g., the mesh 130 of FIG. 1) based on two-dimensional images, as discussed above.
- the mesh component 1055 may process the image(s) using one or more deep learning models (or other machine learning models) trained based on image data and corresponding point cloud (or other three-dimensional data, such as mesh data) for user faces.
- the measurement component 1060 (which may correspond to the measurement component 135 of FIG. 1) may be used to generate facial measurements (e.g., the measurements 140 of FIGS. 1 and/or 3), as discussed above.
- the measurement component 1060 may collect measurements such defining the size, shape, positioning, and/or rotation of various facial landmarks or features, such as the mouth, the nose, the ears, the nostrils, and the like.
- the selection component 1065 (which may correspond to the selection component 145 of FIGS. 1 and/or 3) may be used to generate or select interface components (e.g., the interface 150 of FIGS. 1 and/or 3), as discussed above.
- the selection component 1065 may use mappings (such as the mappings 1070) to identify appropriate component(s) based on the facial measurements, and/or may process one or more of the measurements (e.g., the nostril parameters) using one or more secondary machine learning models to predict or identify the best interface component.
- mappings such as the mappings 1070
- the memory 1010 further includes mapping(s) 1070 and model parameter(s) 1075 for one or more machine learning models.
- the mappings 1070 (which may correspond to the mappings 315 of FIG. 3) generally include mappings indicating, for one or more interface components, a set of facial measurements (e.g., a range of measurements) for which the component is acceptable or suitable.
- the mappings 1070 may indicate, for one or more ranges of facial measurements, a set of interface components that are acceptable or suitable.
- the model parameters 1075 may generally include parameters for any number of models, such as a mesh generation model (e.g., a deep learning mode used by the mesh component 1055 to generate facial meshes), a landmark or feature detection model (e.g., an ear detection model used by the image component 1050 to determine whether the profile images are acceptable), a component classifier model (e g., the nostril pillow classifier model discussed above, used by the selection component 1065 to select pillow sizing), and the like.
- a mesh generation model e.g., a deep learning mode used by the mesh component 1055 to generate facial meshes
- a landmark or feature detection model e.g., an ear detection model used by the image component 1050 to determine whether the profile images are acceptable
- a component classifier model e g., the nostril pillow classifier model discussed above, used by the selection component 1065 to select pillow sizing
- mappings 1070 and model parameters 1075 may be stored in any suitable location, including one or more local storage repositories, or in one or more remote systems distinct from the computing device 1000.
- an apparatus may be implemented or a method may be practiced using any number of the embodiments set forth herein.
- the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various embodiments of the disclosure set forth herein. It should be understood that any embodiment of the disclosure disclosed herein may be embodied by one or more elements of a claim.
- exemplary means “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
- a phrase referring to “at least one of’ a list of items refers to any combination of those items, including single members.
- “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
- determining encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like. Also, “determining” may include resolving, selecting, choosing, establishing and the like.
- the methods disclosed herein comprise one or more steps or actions for achieving the methods.
- the method steps and/or actions may be interchanged with one another without departing from the scope of the claims.
- the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
- the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions.
- the means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor.
- ASIC application specific integrated circuit
- Embodiments of the invention may be provided to end users through a cloud computing infrastructure.
- Cloud computing generally refers to the provision of scalable computing resources as a service over a network.
- Cloud computing may be defined as a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.
- cloud computing allows a user to access virtual computing resources (e.g., storage, data, applications, and even complete virtualized computing systems) in “the cloud,” without regard for the underlying physical systems (or locations of those systems) used to provide the computing resources.
- cloud computing resources are provided to a user on a pay-per-use basis, where users are charged only for the computing resources actually used (e.g. an amount of storage space consumed by a user or a number of virtualized systems instantiated by the user).
- a user can access any of the resources that reside in the cloud at any time, and from anywhere across the Internet.
- a user may access applications or systems (e.g., measurement system 110 of FIG. 1) or related data available in the cloud.
- the measurement system could execute on a computing system in the cloud and train/use machine learning models to generate facial meshes and select interface components. In such a case, the measurement system could maintain the models in the cloud, and use them to drive improved interface recommendations. Doing so allows a user to access this information from any computing system attached to a network connected to the cloud (e.g., the Internet).
- a method comprising: accessing a set of two-dimensional images of a user; generating, based on processing the set of two-dimensional images using a first machine learning model, a three-dimensional mesh depicting a head of the user, wherein the three- dimensional mesh is scaled to a size of the head of the user; modifying the three-dimensional mesh to remove one or more facial expressions; determining a set of facial measurements based on the modified three-dimensional mesh; and selecting a user interface for the user based on the set of facial measurements.
- Clause 2 The method of Clause 1, wherein selecting the user interface comprises generating a recommended pillow size for the user interface based on a set of nostril measurements of the set of facial measurements.
- Clause 3 The method of Clause 2, wherein the set of nostril measurements define at least a first ellipse and comprise at least one of: (i) a major axis, (ii) a minor axis, (iii) a rotation, or (iv) a distance of the first ellipse from a center of a nose of the user.
- Clause 4 The method of any of Clauses 2-3, wherein generating the recommended pillow size comprises processing the set of nostril measurements using a second machine learning model.
- Clause 5 The method of any of Clauses 1-4, wherein selecting the user interface comprises generating a recommended conduit size for the user interface based on the set of facial measurements.
- Clause 6 The method of any of Clauses 1-5, wherein selecting the user interface comprises generating a recommended headgear size for the user interface based on the set of facial measurements.
- Clause 7 The method of Clause 6, wherein generating the recommended headgear size comprises fitting a statistical shape model of a human head to the three-dimensional mesh.
- Clause 8 The method of any of Clauses 1-7, wherein, prior to generating the three- dimensional mesh, at least one two-dimensional image of the set of two-dimensional images was processed using a second machine learning model to detect presence of an ear of the user in the at least one two-dimensional image.
- Clause 9 The method of any of Clauses 1-8, wherein the first machine learning model was trained based on a set of training images depicting a training user and a corresponding set of three-dimensional data points for a head of the training user.
- Clause 10 The method of Clause 9, wherein the first machine learning model does not use a camera model to generate the three-dimensional mesh.
- Clause 11 The method of any of Clauses 1-10, wherein the set of two-dimensional images comprise an image depicting a left side of the head of the user, an image depicting a right side of the head of the user, an image depicting a front of the head of the user, and an image depicting a bottom of the head of the user.
- Clause 12 The method of any of Clauses 1-11, further comprising, after selecting the user interface, deleting the set of two-dimensional images, the three-dimensional mesh, and the set of facial measurements.
- Clause 13 The method of any of Clauses 1-12, further comprising: providing one or more requests for information to the user, wherein the one or more requests for information ask the user to indicate whether they experience difficulty breathing through their nose receiving, from the user, one or more responses to the one or more requests; and selecting the user interface based further on the one or more responses.
- Clause 14 The method of any of Clauses 1-13, wherein accessing the set of two- dimensional images comprises: capturing an image depicting the user; in response to determining that the first image satisfies at least one of (i) a frame criteria, (ii) a brightness criteria, (iii) a level criteria, or (iv) a distance criteria, outputting an instruction to the user to turn the head of the user; capturing a second image depicting the user; and in response to determining that the second image satisfies a threshold angle criteria, adding the second image to the set of two-dimensional images.
- Clause 15 The method of Clause 14, wherein accessing the set of two-dimensional images further comprises, prior to capturing the second image: capturing a third image depicting the user; and in response to determining that the third image does not satisfy the threshold angle criteria, instructing the user to turn the head of the user further.
- Clause 16 The method of any of Clauses 14-15, wherein accessing the set of two- dimensional images further comprises, prior to capturing the second image: capturing a third image depicting the user; and in response to determining that the third image does not depict at least one ear of the user, instructing the user to make the ear of the user visible.
- Clause 17 The method of any of Clauses 14-16, wherein accessing the set of two- dimensional images further comprises, prior to capturing the second image: monitoring an angle to which the head of the user is turned; and outputting a visual indication of the angle relative to the threshold angle criteria.
- Clause 18 The method of any of Clauses 14-17, further comprising, in response to determining that the second image satisfies the threshold angle criteria, outputting a visual indication that the threshold angle criteria are satisfied.
- Clause 19 The method of any of Clauses 14-18, further comprising, in response to determining that the second image satisfies the threshold angle criteria, outputting an audio indication that the threshold angle criteria are satisfied.
- Clause 20 A system, comprising: a memory comprising computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to perform a method in accordance with any one of Clauses 1-19.
- Clause 21 A system, comprising means for performing a method in accordance with any one of Clauses 1-19.
- Clause 22 A non-transitory computer-readable medium comprising computerexecutable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform a method in accordance with any one of Clauses 1-19.
- Clause 23 A computer program product embodied on a computer-readable storage medium comprising code for performing a method in accordance with any one of Clauses 1- 19.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Architecture (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
Abstract
Techniques for improved machine learning are provided. A set of two-dimensional images of a user is accessed. A three-dimensional mesh depicting a head of the user is generated based on processing the set of two-dimensional images using a machine learning model, where the three-dimensional mesh is scaled to a size of the head of the user. The three-dimensional mesh is modified to remove one or more facial expressions. A set of facial measurements is determined based on the modified three-dimensional mesh, and a user interface is selected for the user based on the set of facial measurements.
Description
MACHINE LEARNING FOR THREE-DIMENSIONAL MESH GENERATION BASED ON IMAGES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This Application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/565,412, filed on March 14, 2024, and U.S. Provisional Patent Application No. 63/688,394, filed on August 29, 2024, the entire contents of which are incorporated herein by reference.
INTRODUCTION
[0002] Embodiments of the present disclosure generally relate to computer vision and machine learning. More specifically, embodiments relate to using machine learning to generate three-dimensional meshes based on image data.
[0003] In a wide variety of medical (and non-medical) settings, accurate facial or head measurements are relied upon to drive decisions and selections for the user. For example, in many cases, the particular dimensions of the face of the individual user are needed to help design, construct, and/or select an appropriate mask that will fit the user’s face comfortably and completely. As one example, continuous positive airway pressure (CPAP) machines generally use a mask or nosepiece to deliver constant and steady air pressure to users during sleep. However, for the system to operate properly (as well as to improve the user experience and health), it is important that the mask fit properly (e g., comfortably, and without leaks around the face).
[0004] In some conventional systems, users can visit a physical environment (e.g., the office of a healthcare provider or mask distributor) to try on various masks. However, this requires physical presence of the user, which is not always possible due to factors such as location or remoteness of the user and/or provider, ability to travel, available time for the user, and the like. Further, the user is limited to the available (preconfigured) mask sizes and dimensions, and generally must manually try a number of them to find a correct fit, which can present problems when sterility is required.
[0005] In some conventional systems, attempts have been made to measure or estimate facial dimensions of the user, in order to drive mask selection or design. For example, some
approaches involve the user measuring their own face, such as by using a ruler, a coin or other object with a known size, and the like. However, such approaches have proven to be inaccurate and frustrating to the user leading to poor outcomes. For example, obtaining the wrong mask based on inaccurate measurements may lead to a mask that does not work well for its intended purpose, which can in-turn affect a user’s condition, treatment, and outcomes.
[0006] Additionally, some approaches have attempted to improve on manual measurements by using specialized devices or systems, such as cameras configured to capture three-dimensional data including depth. These specialized devices are complex, expensive, and frequently unavailable. Further, they typically still require the user to physically travel to the location of the device (or require the device to be physically brought to the user), significantly limiting their use.
[0007] Improved systems and techniques to determine facial measurements are needed.
SUMMARY
[0008] According to one embodiment presented in this disclosure, a method is provided. The method includes: accessing a set of two-dimensional images of a user; generating, based on processing the set of two-dimensional images using a first machine learning model, a three- dimensional mesh depicting a head of the user, wherein the three-dimensional mesh is scaled to a size of the head of the user; modifying the three-dimensional mesh to remove one or more facial expressions; determining a set of facial measurements based on the modified three- dimensional mesh; and selecting a user interface for the user based on the set of facial measurements.
[0009] Other embodiments provide processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer-readable media comprising instructions that, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only exemplary embodiments and are therefore not to be considered limiting of its scope, may admit to other equally effective embodiments.
[0011] FIG. 1 depicts an example workflow for generating meshes and selecting user interfaces, according to some embodiments of the present disclosure.
[0012] FIG. 2 depicts an example workflow to facilitate image data collection for improved mesh generation, according to some embodiments of the present disclosure.
[0013] FIG. 3 depicts an example workflow for improved interface selection based on generated meshes, according to some embodiments of the present disclosure.
[0014] FIG. 4 is a flow diagram depicting a method for using machine learning model(s) to generate three-dimensional meshes and select interface components, according to some embodiments of the present disclosure.
[0015] FIG. 5 is a flow diagram depicting a method for collecting and evaluating image data, according to some embodiments of the present disclosure.
[0016] FIGS. 6A and 6B illustrate a flow diagram depicting a method for collecting user image data, according to some embodiments of the present disclosure.
[0017] FIG. 7 depicts an example user interface for collecting user image data, according to some embodiments of the present disclosure.
[0018] FIG. 8 is a flow diagram depicting a method for improved interface selection, according to some embodiments of the present disclosure.
[0019] FIG. 9 is a flow diagram depicting a method for using machine learning to select user interfaces, according to some embodiments of the present disclosure.
[0020] FIG. 10 depicts an example computing device configured to perform various embodiments of the present disclosure.
[0021] To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements and features of one embodiment may be beneficially incorporated in other embodiments without further recitation.
DETAILED DESCRIPTION
[0022] Embodiments of the present disclosure provide apparatuses, methods, processing systems, and computer-readable mediums for machine learning (ML)-based interface selections based on three-dimensional meshes.
[0023] In some embodiments, a measurement system is provided. The measurement system may be configured to evaluate a set of images captured via an imaging sensor (e.g., a webcam) using machine learning to generate a three-dimensional mesh corresponding to the face of a user depicted in the image(s). In some embodiments, such meshes may be used to generate accurate biometric measurements. Beneficially, the system can use two-dimensional images to generate the three-dimensional mesh, and need not rely on complex three- dimensional imaging systems. This enables the system to be implemented using a wide range of widely available imaging sensors, including web cameras, mobile device cameras, and the like. In some embodiments, a deep learning model is trained end-to-end to generate three- dimensional meshes having appropriate size or scale, relative to the user’s face. That is, the mesh may be scaled according to the size of the user’s face, ensuring that measurements taken based on the mesh (e.g., nose size, mouth position, and the like) are accurate.
[0024] In some embodiments, the user may directed to move a body part, such as their face, through a range of motion while an imaging sensor captures images of the body part at different angles. Trained machine learning models may then be used to evaluate the image(s) to ensure they are satisfactory (e.g., to ensure the user’s ear is visible in one or more pictures) to ensure accurate mesh generation.
[0025] In some embodiments, the measurement system may perform various operations using the generated mesh. In some embodiments, the measurement system may morph or deform the mesh to remove facial expressions, if any, depicted in the image. For example, if the user is smiling, raising their eyebrows, or making some other expression, the system may modify the mesh to remove such express! on(s). A wide variety of facial expressions may result in inaccurate facial measurements (e g., inaccurate nostril size prediction due to deformation
of the user’s skin when they smile). Therefore, by manipulating the three-dimensional mesh to remove such expressions, the measurement system can ensure that the captured measurement data is highly accurate.
[0026] In some embodiments, once the three-dimensional facial mesh has been generated and processed appropriately (e.g., to remove facial expressions), the mesh can be used to estimate, calculate, compute, or otherwise determine a set of facial measurements of the user. The particular measurements collected may vary depending on the particular implementation and task. For example, in some embodiments, the measurement system generates the facial measurements to facilitate selecting and/or fitting of one or more devices or components designed to be worn on the head or face of the user, such as user interfaces (e.g., masks) for respiratory therapy (e.g., CPAP).
[0027] For example, in some embodiments, the measurement system may determine measurements such as the face height, nose width, nose depth, nostril size, and the like. In various embodiments, the particular facial measurements that are determined and used may vary depending on the particular task (e.g., to allow determination of proper sizing for conduits (e.g., for tube-up masks), head gear, nostril sizes for pillow masks, and the like). As discussed below in more detail, these measurements can be used to select, design, customize, or otherwise retrieve a facial device or user interface for the user, such as an appropriately-fitted mask for the user, to ensure functionality, comfort, and stability.
Example Workflow for Generating Meshes and Selecting User Interfaces
[0028] FIG. 1 depicts an example workflow 100 for generating meshes and selecting user interfaces, according to some embodiments of the present disclosure.
[0029] In the illustrated example, a measurement system 110 accesses a set of image(s) 105 and generates or selects an interface 150 (e g. a recommendation or selection of the interface 150) based on the image(s) 105. As used herein, “accessing” data may generally include receiving, retrieving, requesting, obtaining, collecting, capturing, measuring, or otherwise gaining access to the data. For example, in some embodiments, the image(s) 105 may be captured via one or more imaging sensors (e.g., a webcam or a camera on a smartphone) and may be transmitted to the measurement system 110 via one or more communication links. In some embodiments, the measurement system 110 is implemented as a cloud-based service that evaluates user images 105 to generate the interfaces 150. For
example, users may use an application to capture the image(s) 105 on their local devices (e.g., the user’s laptop or phone), and may then upload the image(s) 105 to the measurement system 110 for evaluation. Generally, the measurement system 110 may be implemented using hardware, software, or a combination of hardware and software. Further, though illustrated as a discrete system for conceptual clarity, in some embodiments, the operations of the measurement system 110 may be combined or distributed across any number and variety of devices and systems.
[0030] In some embodiments, the image(s) 105 generally correspond to two-dimensional images that depict the head and/or face of the user. In some embodiments, the image(s) 105 depict the user from multiple angles or orientations For example, the image(s) 105 may include a frontal image (e.g., captured while the user’s face is angled directly towards the imaging sensor, such that the image depicts the face of the user from straight on), one or more side or profile images (e g , captured while the user turned their face towards the left and/or right side of the imaging sensor, such that the image(s) depict the side of the user’s face and/or the user’s ear(s)), a bottom image (e.g., captured while the user looked upwards relative to the imaging sensor, such that the image depicts the user’ s chin, neck, and/or nostrils), and/or a top image (e.g., captured while the user looked downwards relative to the imaging sensor, such that the image depicts the top of the user’s head).
[0031] Although not depicted in the illustrated example, in some aspects, the measurement system 110 may receive various other data, such as metadata associated with one or more images 105 (e.g., indicating characteristics such as the field of view (FOV) or focal length of the camera that captured the image(s) 105).
[0032] In the illustrated workflow 100, the image(s) 105 are accessed by an image component 115. The image component 115 generally facilitates collection and evaluation of the images 105. The particular operations performed by the image component 115 may vary depending on the particular implementation. For example, in some embodiments, the image component 115 may perform various preprocessing operations, such as to enhance contrast, reduce noise, resize the images, crop the images, perform color correction on the images, perform feature extraction, and the like. In some embodiments, the image component 115 may evaluate one or more of the image(s) 105 to confirm that the image(s) 105 are suitable for mesh generation. For example, the image component 115 may use one or more machine learning models to detect the presence (or absence) of various facial features or landmarks that are
useful in mesh generation, such as the ear(s) of the user in the profile image(s). In some embodiments, if such landmarks are not visible, the image component 115 may request additional image(s) 105 to improve the measurement process.
[0033] As illustrated, the image component 115 provides image data 120 to a mesh component 125. The image data 120 may correspond to or comprise the image(s) 105 themselves, and/or may correspond to the image(s) 105 after preprocessing operation(s) are applied, such as noise reduction or resizing. In some embodiments, the image data 120 corresponds to or comprises the results of feature extraction. That is, the image data 120 may comprise feature map(s) generated for the image(s) 105.
[0034] The mesh component 125 processes the image data 120 to generate a mesh 130. In some embodiments, the mesh component 125 uses one or more machine learning models to generate the mesh 130. For example, the mesh component 125 may use a deep learning model (e.g., a convolutional neural network) to generate the mesh 130. In some embodiments, the mesh component 125 uses the machine learning model(s) to fit a statistical shape model representing a statistically average face to the image data 120, causing the mesh 130 to depict or correspond to the face of the user.
[0035] In some embodiments, the mesh component 125 uses a camera model to scale the mesh 130. For example, in some embodiments, the mesh component 125 may determine the FOV of the camera used to capture the image(s) 105 (e.g., from metadata associated with the images 105, and/or by processing the images themselves). In some embodiments, the perceived size of various facial landmarks may change as the landmarks move closer to or further from the camera. Based on the perceived changes in size of the landmark(s) (e.g., the user’s head, or more granular landmarks such as eyes or ears), in some embodiments, the mesh component 125 can use a camera model to determine or infer the FOV of the camera and/or the distance between the camera and the landmark(s). Using this information, in some embodiments, the mesh component 125 can determine the scale of the face or features therein. For example, after determining that the user’s nose is N millimeters away from the camera and that the FOV of the camera is X degrees, the mesh component 125 may determine the actual size of the user’s nose (e.g., in millimeters).
[0036] In some embodiments, rather than using a camera model to generate or scale the mesh, the mesh component 125 may use a deep learning model that generates an appropriately
scaled mesh 130. For example, the model may be trained based on facial exemplars to generate the mesh 130 in a way that inherently understands the scale of the face, without using a separate camera model (e.g., without explicitly determining or evaluating the FOV of the camera, for example). In some embodiments, to train the model, the measurement system 110 (or another system) may use relatively dense exemplars, such as images and corresponding meshes or dense coordinates of facial landmarks.
[0037] In some embodiments, some or all of the training data comprises synthetic data. For example, accurate three-dimensional meshes or models of synthetic heads and/or faces may be generated using various computer programs (e.g., models of people that are not real individuals or users, but where the models are nevertheless realistic). In some embodiments, the measurement system 110 (or another system) may render image(s) depicting the modeled head from various angles (e.g., by placing a virtual camera in the virtual space at various positions around the head). These images may be used as the training input to the model, paired with some or all of the mesh itself (e.g., data points in three-dimensional space, such as defining various landmarks of the face) used as the target output. In some embodiments, in addition to or instead of using synthetic data to train the model, the training data may include real data (e.g., real images of a user, coupled with highly accurate three-dimensional data points). For example, users may volunteer to use a scanning device capable of capturing image data and three-dimensional positioning data for their face.
[0038] However, such scanning devices are often cumbersome, expensive, and difficult to use. Further, scanning actual faces of users to train the model may implicate various privacy concerns. Experimentation has shown that using purely synthetic data to train the model can nevertheless provide robust mesh generation during runtime.
[0039] In the illustrated workflow 100, the mesh 130 is a three-dimensional mesh representing at least a portion of the user’s head and face. For example, the mesh 130 may depict the user’s face, a portion of their neck, and/or a portion of their head (e.g., including the ears). The mesh 130 is accessed by a measurement component 135. In the illustrated example, the measurement component 135 generates a set of measurements 140 based on the mesh 130.
[0040] In some embodiments, prior to generating the measurements 140, the measurement component 135 may apply one or more preprocessing operations. For example, the measurement component 135 may morph or deform the mesh 130 to remove the facial
expression(s) of the user, if any, resulting in a mesh that reflects the face of the user in a neutral expression.
[0041] In embodiments, as discussed above, the particular measurements captured by the measurement component 135 may vary depending on the particular implementation and task. For example, in some aspects, the measurement component 135 may evaluate the mesh 130 to determine features such as the nose width, nose height, and/or nose depth of the user. Such measurements may be useful to select or provide facial devices such as a face mask (e.g., a respiratory therapy mask) that covers the nose of the user. As another example, in some embodiments, the measurement component 135 may measure the height and/or width of the user’s mouth, and/or the positioning of the mouth relative to the nose, for similar reasons.
[0042] In some embodiments, the measurement component 135 may determine the overall size of the user’s head (e.g., the circumference of the user’s head), which may be a useful metric for conduit and/or headgear sizing (e.g., to select a conduit that is sufficiently large to comfortably reach around the user’ s head without being too large such that it is uncomfortable, and/or to select headgear that will fit comfortably). As used herein, “headgear” refers to the straps, bands, or other components used to secure the user interface to the user’s nose, mouth, or both. As used herein, the “conduit” refers to a tube that connects the user interface (e g., a CPAP mask) to the respiratory therapy device (e.g., the flow generator) and provides airflow to the user, from the flow generator, via the interface. In some aspects, to facilitate conduit sizing, the measurement component 135 may determine the length of the conduit path along the user’s face (e.g., along the path where the conduit is designed to sit, such as from the nose and/or mouth and up over each ear).
[0043] In some embodiments, to enable conduit sizing, the measurement component 135 may construct or identify a number of points, on the face and/or head of the user (as reflected by the mesh 130), where the conduit should lie when in use. For example, the measurement component 135 may identify a point under the nose (e.g., in the middle of the user’s philtrum), one or more points on each cheek (e.g., at a defined location, such as relative to the nostril or another facial feature, such as the temporal process of the zygomatic bone), a point between the ear and the eye (e.g., at the midline between the user’s left ear and left eye, as well as between the user’s right ear and right eye), a point on the top (e g., uppermost point) of the user’s head, and the like. In some aspects, the measurement component 135 may similarly identify or place intermediate points on the surface of the mesh in between the above-
referenced landmark points. The measurement component 135 may then connect the point(s) with a spline (lying on the surface of the mesh), and measure the length of the spline. In some embodiments, the length of this spline may then be used to determine or infer the appropriate conduit size, as discussed below.
[0044] In some embodiments, to enable headgear sizing, the measurement component 135 may construct or identify a number of points, on the face and/or head of the user (as reflected by the mesh 130), in a similar manner to the above-discussed conduit spline (where the particular points may differ). For example, the measurement component 135 may identify points located in locations where the various straps or other components of the headgear will fit on the user’s head. The measurement component 135 may then construct a spline connecting these points, as discussed above. In some aspects, the measurement component 135 may similarly generate multiple such splines along the mesh surface (e.g., for each portion or strap of the headgear), measuring the length of each headgear spline. In some embodiments, the length of these headgear spline(s) may then be used to determine or infer the appropriate headgear size, as discussed below.
[0045] In some embodiments, the measurement component 135 may determine the nostril size of the user based on the mesh 130. For example, the measurement component 135 may characterize the nostril(s) of the user using four parameters defining an ellipse: the major axis and minor axis of the ellipse, the rotation of the nostril/ellipse relative to a fixed orientation (e.g., relative to the plane of the face), and the distance between the nostril/ellipse and the centerline of the mesh 130 (e.g., the centerline of the user’s face). Although some examples discussed use an ellipse to define the nostril measurements, in some embodiments, the measurement component 135 may use a variety of polygons having any number of sides to define the shape of the nostril.
[0046] In the illustrated example, the measurements 140 (also referred to in some embodiments as facial measurements) are accessed by a selection component 145. The selection component 145 evaluates the measurements 140 to select or generate the interface 150. In some embodiments, the selection component 145 may evaluate one or more of the measurements 140 using one or more thresholds or mappings to select various components of the interface 150. For example, based on the nose size and/or shape of the user, the selection component 145 may evaluate predefined mappings indicating which interface(s) will fit best or be most comfortable. As one example, the selection component 145 may determine that a
first nasal-only mask may be too small to comfortably fit, that a second nasal-only mask will be too large (e.g., such that air leak occurs), and/or that a third nasal-only mask will fit well and be comfortable. As another example, based on the measurements 140, the selection component 145 may determine that the user should use a particular type or model of full-face masks (e.g., an oronasal mask).
[0047] As another example, for conduit sizing, the selection component 145 may evaluate the conduit spline length (discussed above) using a rules-based or threshold-based approach (e.g., selecting a conduit size based on the range into which the spline length falls), and/or may process the spline length using a machine learning model (e.g., a trained classifier) to select the conduit size. As yet another example, for headgear sizing, the selection component 145 may evaluate the length(s) of the headgear spline(s) (discussed above) using a rules-based or threshold-based approach (e.g., selecting a headgear size based on the range into which the spline lengths fall), and/or may process the spline length(s) using a machine learning model (e.g., a trained classifier) to select the headgear size.
[0048] In some embodiments, the selection component 145 may evaluate some or all of the measurements 140 to select a nasal pillow size for the user. Nasal pillows are generally soft inserts that fit partially into the nostrils of the user, providing airflow via the nostrils (whereas a nasal mask fits over the nose, and a full-face mask fits over the nose and mouth). In some embodiments, the selection component 145 uses a classifier machine learning model to select the pillow size based on the nostril measurements. For example, the classifier may process the measurements such as nostril major and minor axes, rotation, and/or distance to centerline to generate a classification indicating which size pillow would fit the user best. In some embodiments, the classifier may be a relatively small or simple machine learning model. The measurement system 110 (or another system) may train the nostril classifier using labeled exemplars. For example, the training data may include nostril measurements (as discussed above) of one or more users, where the label for each training sample indicates the pillow size that the user found most comfortable (or that otherwise led to the best results, such as the minimum air leakage).
[0049] Generally, the generated or selected interface 150 may include selections for a variety of interface components, as discussed above. For example, the interface 150 may indicate a recommended user interface style or design (e.g., nasal only, full-face, or nasal pillow), a recommended model or size of interface (e.g.. from a set of alternative options), a
recommended conduit sizing, a recommended headgear sizing, a recommended pillow size (for nasal pillow masks), and the like.
[0050] In some embodiments, the measurement system 110 may delete the user image(s) 105 and/or mesh 130 after processing in order to preserve user privacy. For example, in some embodiments, once the mesh 130 is generated, the measurement system 110 may delete the images 105 and image data 120. Further, once the measurements 140 are generated, the measurement system 110 may delete the mesh 130. Additionally, in some embodiments, once the interface 150 is generated, the measurement system 110 may delete the measurements 140.
[0051] In some embodiments, the measurement system 110 can provide the selected interface 150 to the user depicted in the images 105. In some embodiments, the interface 150 is indicated to another user, such as a healthcare provider of the depicted user, who can facilitate ordering and/or delivery of the indicated equipment. In these ways, the measurement system 110 can use machine learning to generate accurate three-dimensional meshes, and then evaluate these meshes to select or recommend equipment for respiratory therapy in a highly granular way. This can improve the results achieved by users (e.g., improving the progress of the therapy) while reducing or eliminating negative outcomes (e.g., discomfort due to poorly fitted masks, substantial air leak, and the like).
Example Workflow to Facilitate Image Data Collection for Improved Mesh Generation
[0052] FIG. 2 depicts an example workflow 200 to facilitate image data collection for improved mesh generation, according to some embodiments of the present disclosure. In some embodiments, the workflow 200 may be performed by a measurement system, such as the measurement system 110 of FIG. 1.
[0053] As illustrated, the image component 115 of the measurement system comprises an evaluation component 205 and a preprocessing component 210. Although depicted as discrete components for conceptual clarity, in some embodiments, the operations of the depicted components (and others not illustrated) may be combined or distributed across any number of components. In the depicted workflow 200, the image component 115 provides one or more instructions 215 to a user device 220. The instructions 215 may generally include any information to facilitate collection of user images.
[0054] For example, the instructions 215 may include textual instructions, pictorial instructions, video instructions, audio instructions, and the like. In some embodiments, the instructions 215 may indicate how the user should position themselves (e.g., by superimposing an ellipse or a human head or face over a live feed from the camera of the user device 220). For example, the instructions 215 may instruct the user to look at the camera, to turn their head to either side of the camera, to look up, and the like. As one example, in some embodiments, the instructions 215 may include depicting or superimposing a box and/or two ellipses (one for each nostril), over the image(s) from the device’s camera, and text instructing the user to position their nostrils within the box and/or in the ellipses, and to capture the image when their nostrils are arranged appropriately. Such instructions 215 may enable improved image capture.
[0055] In some aspects, the instructions 215 may include requesting that the user perform a breathing exercise to determine how well the user can breathe through their nose. For example, the instructions 215 may cause the user device 220 to output an animation via a display, and ask the user to breathe (through their nose) in synchronization with the animation. The particular contents of the animation may vary depending on the particular implementation. For example, the animation may include one or more circles (or other shapes) expanding and contracting, asking the user to inhale as the shape(s) expand, and exhale as the shape(s) contract. After one or more such breathing cycles, the instructions 215 may ask the user to indicate whether they were able to breathe comfortably during the exercise (or to rate their level of comfort).
[0056] Generally, the instructions 215 may be provided using any number and variety of communication links. For example, if the image component 115 operates in a cloud deployment, the instructions 215 may be transmitted via one or more wired and/or wireless networks, (including the Internet) to the user device 220. The user device 220 is generally representative of any computing device that a user may use to capture and/or provide image(s) 105 to the image component 115. For example, the user device 220 may correspond to a laptop computer, a desktop computer, a smartphone, a tablet, and the like. In some embodiments, the user device 220 comprises one or more imaging sensors (e.g., cameras) integrated into the device or available as an external device (e.g., a plugin webcam).
[0057] Although not depicted in the illustrated example, in some embodiments, the image component 115 (or another component of the measurement system 110) may provide one or
more questions or surveys via the user device 220 to help guide the interface selection process. For example, in some embodiments, the user may be asked whether they have used any other interfaces within a defined period of time (e.g., the last thirty days), and if so, the user may be asked to provide further information such as the model or type of the prior interface(s), the model or type of their current interface, and/or a reason for why they switched (e.g., because they could not get a good seal, because the prior interface was uncomfortable, because they had facial markings or irritation, because they felt claustrophobic with the old interface, because air was leaking and/or they were mouth breathing, because the mask would not stay in place, and the like). In some embodiments, the system may similarly ask the user to indicate whether they initiated the switch (as compared to, for example, their healthcare provider suggesting a switch). Such information may be useful to suggest a new interface for the user (e.g., to select full face, pillow, or nasal mask based on their responses and/or prior interface usage). For example, if the user indicated feelings of claustrophobia while using a full face mask, the measurement system 110 may suggest a nasal or pillow interface.
[0058] In some embodiments, the image component 115 (or another component of the measurement system 110) may provide questions related to whether the user breathes through their mouth or otherwise has difficulty breathing through their nose. For example, the user may be asked whether they experience a variety of common concerns (e.g., dry mouth, nasal congestion or irritation, and the like). As another example, the system may ask the user if they have noticed any air leak from their current interface (if they are already participating in therapy), whether they breathe through their mouth when using the therapy, whether they find themselves breathing through their mouth when exerting themselves (e.g., when walking up stairs), whether the user, when asked to take a deep breath, finds it easier to breathe through their mouth or their nose, whether the user has any medical conditions that make breathing through the nose difficult (such as the common cold, chronic sinusitis, chronic allergies, deviated septum, and the like).
[0059] In some embodiments, such questions (to determine whether the user tends to breathe through their mouth) may be useful to allow the measurement system to select a good interface recommendation, as discussed above. For example, in addition to recommending specific sizes or models, the measurement system may further recommend specific types based on the user responses (e g., suggesting a full face mask for users who have difficulty breathing through their nose or who otherwise tend to breathe through their mouth).
[0060] As illustrated, the user device 220 transmits one or more image(s) 105 to the image component 115. In the illustrated workflow 200, the preprocessing component 210 may first perform one or more preprocessing operations on the images 105. For example, as discussed above, the preprocessing component 210 may resize the images 105 to a standard or default size, and/or perform a variety of operations such as contrast enhancement and noise reduction to improve the machine learning process.
[0061] In the illustrated example, the evaluation component 205 may evaluate the images 105 (or the preprocessed image data generated by the preprocessing component 210) to determine whether the images 105 are acceptable. For example, the evaluation component 205 may use various machine learning models to detect whether the user’s face is depicted in the image(s) 105, whether there is sufficient lighting, and the like. In some embodiments, the evaluation component 205 uses a machine learning model trained to identify or detect whether ear(s) are depicted in an image 105. For example, the evaluation component 205 may process the image(s) 105 corresponding to when the user turned left and/or right in order to determine whether the user’s ear(s) are visible. Such landmarks may be useful to improve the mesh generation, as it may enable more accurate shape and sizing of the model head, which can improve headgear sizing. In some embodiments, if the evaluation component 205 determines that one or more image(s) 105 are not acceptable, the image component 115 can send a new set of instructions 215 to the user asking them to try again (e.g., to take the profile picture again, but move their hair back and out of the way).
[0062] In the illustrated workflow 200, this process may be repeated any number of times until an acceptable set of images 105 is obtained.
[0063] Although depicted as being performed by the image component 115 of a measurement system, in some aspects, some or all of the discussed operations may be performed locally on the user device 220. For example, the ear detection machine learning model may be a lightweight classifier that can be executed by the user device 220 to detect the ear visibility locally, allowing the user to immediately capture another image if needed. This may reduce network bandwidth consumed by the process (e.g., reducing the number of images 105 transmitted across the network) as well as reducing computational expense on the measurement system.
[0064] Although not depicted in the illustrated workflow 200, in some embodiments, once an acceptable set of images 105 has been generated, the image component 115 may provide the images 105 (or image data generated therefrom, such as feature maps) to one or more other components of the measurement system (e.g., the mesh component 125 of FIG. 1), as discussed above.
Example Workflow for Improved Interface Selection based on Generated Meshes
[0065] FIG. 3 depicts an example workflow 300 for improved interface selection based on generated meshes, according to some embodiments of the present disclosure. In some embodiments, the workflow 300 may be performed by a measurement system, such as the measurement system 110 of FIG. 1.
[0066] As illustrated, the selection component 145 of the measurement system comprises a mapping component 305 and a classifier component 310. Although depicted as discrete components for conceptual clarity, in some embodiments, the operations of the depicted components (and others not illustrated) may be combined or distributed across any number of components. In the depicted workflow 300, the selection component 145 accesses the set of measurements 140 (e.g., facial measurements generated by a measurement component, such as the measurement component 135 of FIG. 1, based on a three-dimensional facial mesh depicting a user, such as the mesh 130 of FIG. 1).
[0067] As discussed above, the measurements 140 generally include one or more measurements indicating the size, shape, and/or positioning of one or more facial landmarks in three-dimensional space, such as the relative size, shape, and/or position of the user’s eyes, nose, mouth, ears, nostrils, and the like.
[0068] In the illustrated example, the selection component 145 also accesses a set of mappings 315. In some embodiments, the mappings 315 generally indicate prior sizing of one or more components of user interfaces. That is, the mappings 315 may indicate, for one or more components (e.g., conduit sizes, interface types or models, headgear sizes, and the like) a range of measurements for which the component was designed and/or which the component will fit appropriately. For example, the mappings 315 may indicate that a first nasal-only mask is best for users having a first set of nose measurements, while a second nasal-only mask is better for users having a second set of nose measurements. In some embodiments, the mappings 315 may be defined or provided by the designers or manufacturers of the therapy
components, and/or may be determined based on user interactions (e.g., surveying users to determine which component(s) they prefer).
[0069] In some embodiments, the mapping component 305 may evaluate some or all of the measurements 140 using the mappings 315 to select appropriate interface components. For example, as discussed above, the mapping component 305 may select one or more alternatives that align with the measurements 140, such as one or more interfaces, one or more conduits, one or more headgear sizes, and the like.
[0070] In the illustrated example, the classifier component 310 may similarly evaluate some or all of the measurements to select appropriate interface components. In some embodiments, as discussed above, the classifier component 310 may process the measurement data using one or more machine learning models in order to select the components. For example, the classifier component 310 may process nostril measurements (e.g., the major and minor axes of an ellipse corresponding to the nostril, the rotation of the nostril, and/or the distance between the nostril and the center of the nose) using a classifier model to select a pillow size for the user.
[0071] As illustrated, the selection component 145 generates an interface 150 based on the measurements 140. Although a single interface 150 (e.g., a single set of components) is depicted for conceptual clarity, in some embodiments, the selection component 145 may generate a set of alternatives. For example, the selection component 145 may generate a first interface 150 for a nasal-only style (e.g., recommending a particular interface, conduit, and headgear if the user wants to use a nasal mask), a second interface 150 for a full-face style (e.g., recommending a particular interface, conduit, and headgear if the user wants to use a full-face mask), and/or a third interface 150 for a pillow style (e.g., recommending a particular interface, conduit, headgear, and pillow size if the user wants to use a pillow mask).
[0072] Similarly, in some embodiments, the selection component 145 may indicate alternatives within the same style or type of mask. For example, suppose the mappings 315 include overlapping ranges of measurements for one or more components. In some embodiments, if the user’s measurements 140 lie in the overlapping region(s), the interface 150 may indicate that any of the alternatives may be suitable.
[0073] In these ways, the selection component 145 can provide substantially improved interface selection for users, resulting in improved therapy outcomes and comfort.
Example Method for Using Machine Learning Model(s) to Generate Three-Dimensional Meshes and Select Interface Components
[0074] FIG. 4 is a flow diagram depicting a method 400 for using machine learning model(s) to generate three-dimensional meshes and select interface components, according to some embodiments of the present disclosure. In some embodiments, the method 400 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-3.
[0075] At block 405, the measurement system accesses a set of image(s) (e.g., the images 105 of FIG. 1 and/or FIG. 2). In some embodiments, as discussed above, the images generally depict the head and/or face of a user. In some embodiments, the images are provided in order to enable the measurement system to generate a three-dimensional mesh corresponding to the user’s face, allowing the measurement system (or another system) to capture highly accurate measurements relating to the size, shape, positioning, and/or orientation of various facial features or landmarks, such as the nose, nostrils, eyes, mouth, ears, and the like. In some aspects, as discussed above, accessing the images includes evaluating the images to confirm that they meet defined acceptance criteria, such as a minimum size, a minimum resolution, a minimum amount of lighting and/or contrast, and the like. One example method for accessing the images is discussed in more detail below with reference to FIG. 5.
[0076] At block 410, the measurement system generates a mesh (e.g., the mesh 130 of FIG. 1) based on processing the accessed image(s) (or image data generated therefrom) using one or more machine learning models (e.g., a deep learning model). For example, as discussed above, the measurement system (or another system) may train a machine learning model using training samples, each sample comprising one or more images (depicting a respective user) as the input and a corresponding set of three-dimensional data points for a set of landmarks on the user’s face (or a mesh of the respective user’s face) used as the target or label. In some embodiments, as discussed above, some or all of the training samples may comprise synthetic data (e.g., synthetic or artificial face meshes used as the label, with rendered images of the meshes used as the input).
[0077] Generally, the particular operations used to train the machine learning model may vary depending on the particular architecture. For example, in some embodiments, the measurement system (or other training system) may process the image(s) of a training sample as input to them model (e.g., a deep learning convolutional neural network) to generate a mesh.
The mesh may then be compared against the label (e.g., the actual mesh or other data points in three-dimensional space) to generate a loss. The loss may generally use a variety of formulations, such as surface-to-surface loss, point-to-point loss, surface normal loss, Laplacian regularization loss, and the like. In some embodiments, the parameters of the model may then be updated (e g., using backpropagation) based on the loss. In this way, the model learns to generate more accurate output meshes based on input images. In embodiments, the model may be trained using individual samples (e.g., using stochastic gradient descent) and/or batches of samples (e.g., using batch gradient descent) over any number of iterations and/or epochs.
[0078] In some embodiments, once trained (e g., once the model reaches a desired level of accuracy, once no additional training samples are available, once a defined number of training iterations or a defined amount of computing resources have been spent, and the like), the model can be used for runtime mesh generation based on input user images.
[0079] In some embodiments, as discussed above, the mesh generated by the machine learning model is already scaled to the size of the user’s head and/or facial features. That is, because the model may be trained end-to-end using relatively dense labels (e.g., dense point clouds and/or meshes), the model may inherently learn to predict the scale of the face, without separately predicting how the camera affects the perceived size (e.g., based on FOV and/or distance). For example, in some embodiments, the machine learning model learns to respect or recreate the identity (e.g., facial shape) of the person regardless of any variations in angle, background, FOV of the camera, distance of the camera, and the like (in a similar manner to how some facial recognition models work). In this way, the generated mesh may be inherently scaled correctly by the mode.
[0080] In some embodiments, if the mesh is not inherently scaled, the measurement system may then scale the output mesh using such a camera model. For example, the measurement system may use the camera model and/or the FOV of the camera (if known) to predict the appropriate size for the mesh (or features therein). For example, objects further from the camera are perceived as smaller, relative to objects nearer to the camera. Therefore, the measurement system may evaluate the change(s) in perceived size of one or more facial landmarks (e.g., the user’s ears, nose, mouth, and the like) across the images in order to predict the FOV of the camera, the distance to the landmark(s), and/or the actual size of the feature(s). This allows the measurement system to scale the mesh accurately. In some embodiments, use
of a camera model may refer to using a perspective projection technique that projects the mesh from world space to camera space. If the parameters of the camera (e.g., FOV) are known, one or more projected keypoints can be compared with the ground truth keypoint locations (on the image) to determine the appropriate scaling of the mesh.
[0081] At block 415, the measurement system removes facial expression(s) present in the mesh, if present. For example, the measurement system may deform the mesh to place the face in a neutral position (e.g., to remove expressions such as smiling, an open mouth, raised eyebrows, and the like). In some embodiments, to remove facial expressions, a statistical model comprising a shape kernel (e.g., indicating a statistically average head shape) and one or more expression kernels (e g , indicating various facial expressions) may be used. The kemel(s) generally correspond to statistical models generated using principal component analysis (PCA) on facial datasets. For example, for the shape model, the kernel may be generated by performing PCA on a dataset of neutral expressions. For the expression model(s), the kemel(s) may be generated by similarly performing PCA on datasets of various expression(s). In some embodiments, the expression kemel(s) may be used to remove any facial expressions present in the mesh (to cause the mesh to depict or correspond to a neutral expression). As discussed above, removing facial expressions may result in improved measurement accuracy, as compared to taking measurements from a mesh depicting one or more facial expressions. Further, by removing the expression dynamically from the mesh itself, the measurement system may avoid the need to request additional images from the user (e.g., asking the user to take another picture without smiling). This can improve user experience and reduce the time consumed by the measurement process.
[0082] At block 420, the measurement system generates one or more facial measurements (e.g., the measurements 140 of FIGS. 1 and/or 3) based on the mesh. For example, as discussed above, the measurement system may generate measurements reflecting features such as the size and shape of the nose, the size and shape of the mouth, the positioning of the mouth relative to the nose, the size, shape, and positioning of the nostrils, the length of the conduit path or other circumferential measure around the user’s head (e.g., for the conduit sizing and/or headgear sizing), and the like.
[0083] At block 425, the measurement system selects one or more interface components, for the user, based on the facial measurements. For example, as discussed above, the measurement system may select one or more alternatives for each category of component (e.g.,
one or more conduits, one or more mask types and/or models, and the like) that are well-suited for the user, based on the determined measurements. Although not depicted in the illustrated example, in some embodiments, the measurement system may further select the interface component(s) based at least in part on user responses to survey questions, as discussed above. For example, if the user reports feelings of claustrophobia, the measurement system may select a nasal or pillow style interface. As another example, if the user reports difficulty breathing through their nose and/or tendency to breathe through their mouth, the measurement system may select a full-face style interface. As yet another example, the measurement system may ask the user to engage in a nose breathing exercise (e.g., synchronizing their breathing with an animation), and then ask the user to report how well they could breathe (through their nose) during the exercise. The mask style may be selected based (at least in part) on the user response to this exercise. One example method for selecting the interface components is discussed in more detail below with reference to FIG. 8.
[0084] In these ways, using the method 400, the measurement system is able to use machine learning to generate highly accurate three-dimensional meshes based on two- dimensional images, and then collect highly granular facial measurements based on the meshes. This can substantially improve the accuracy of the measurements, resulting in improved reliability in selecting appropriate interface components. As discussed above, these improved selections then enable improved respiratory therapy, such as through increased comfort (which may result in increased uptake or usage of the therapy), decreased air leak or other negative concerns, reduced difficulty or hassle in determining which equipment to select (which may increase the number of patients who decide to start therapy, as the barrier to entry is reduced), and the like. This can substantially improve results for a wide variety of users of respiratory therapy.
Example Method for Collecting and Evaluating Image Data
[0085] FIG. 5 is a flow diagram depicting a method 500 for collecting and evaluating image data, according to some embodiments of the present disclosure. In some embodiments, the method 500 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-4. In some embodiments, some or all of the method 500 may be performed by other systems, such as locally by the user device (e.g., the user device 220 of FIG. 2) used to capture the
images. In some embodiments, the method 500 provides additional detail for block 405 of
FIG. 4
[0086] At block 505, the measurement system provides user instructions (e.g., the instructions 215 of FIG. 2) to the user. Generally, as discussed above, the instructions may take a number of forms, such as text instructions, pictorial instructions, video instructions, audio instructions, and the like. The instructions generally indicate what image(s) are needed, such as by illustrating example images that would be acceptable, instructing how the user should angle their head relative to the camera, and the like. In some embodiments, the measurement system may provide instructions for each image independently. For example, the measurement system may provide instructions for a first image in the desired set, capture the image, and then provide instructions for the next image. In other embodiments, the measurement system may provide instructions for all images at once.
[0087] In some embodiments, as discussed above, the measurement system may similarly provide one or more questions or surveys to the user (e.g., to infer or determine whether they tend to breathe through their mouth, or to identify any prior interfaces that the user has stopped using). Such information may be useful in providing improved interface selection, as discussed above.
[0088] At block 510, the measurement system receives one or more user images (e.g., the images 105 of FIGS. 1-2) from the user device. In some embodiments, as discussed above, the measurement system receives one or more individual images (e.g., the user device may capture one or more images, such as when the user indicates that they are ready). In some embodiments, rather than a set of images, the measurement system receives a video segment (e.g., a stream or sequence of frames). For example, the user device may record a video of the user following the instructions, allowing the measurement system to select the best image(s).
[0089] At block 515, the measurement system evaluates the received user image(s) to determine whether the image(s) satisfy one or more defined acceptance criteria. The particular criteria used may vary depending on the particular implementation. For example, in some embodiments, the measurement system may determine whether the image(s) are sufficiently high resolution, have sufficient contrast or clarity, have appropriate lighting, and the like.
[0090] In some embodiments, as discussed above, the measurement system may evaluate the image(s) to confirm whether the user followed the instructions appropriately. For example,
the measurement system may process the image(s) using one or more computer vision models trained to identify the presence of one or more landmarks or features, such as ear(s), eye(s), the mouth, the nose, and the like. As one example for the profile image(s), the measurement system may use an ear detection model to confirm whether the user’s ear is visible.
[0091] Generally, the particular operations used to train the machine learning models may vary depending on the particular architecture. For example, in some embodiments, the measurement system (or other training system) may process an image of a training sample as input to them model to generate a binary output (or a set of binary outputs) indicating whether one or more landmarks or features are present. The output may then be compared against the label (e g , whether each landmark is, in fact, present) to generate a loss The loss may generally use a variety of formulations, such as cross-entropy loss, depending on the particular implementation. In some embodiments, the parameters of the detection model may then be updated (e g , using backpropagation) based on the loss In this way, the model learns to predict whether one or more landmarks or facial features are present in provided images. In embodiments, the model may be trained using individual samples (e.g., using stochastic gradient descent) and/or batches of samples (e g., using batch gradient descent) over any number of iterations and/or epochs.
[0092] At block 520, the measurement system determines (based on the evaluation) whether the criteria are satisfied. If not, the method 500 returns to block 505. In some embodiments, the measurement system provides additional instruction at block 505 based on the particular criteria that were not met. For example, the measurement system may specifically indicate that the lighting was poor, that the user was too far from the camera, that the angle was wrong, that the user should ensure their ear is visible, and the like.
[0093] If, at block 520, the measurement system determines that the criteria are satisfied, the method 500 continues to block 525, where the measurement system determines whether there are one or more additional images, in the desired set of images, which have not yet been provided. If so, the method 500 returns to block 505. If not, the method 500 continues to block 530. Although the illustrated method 500 depicts a sequential process for conceptual clarity (e.g., iteratively receiving and evaluating each image in turn), in some embodiments, the measurement system may receive and/or evaluate some or all of the images in parallel. For example, in some embodiments, the measurement system receives a video of the user moving
their head to each designated position (e.g., forward, left, right, up, and down), and may extract appropriate images from this video sequence.
[0094] At block 530, the measurement system optionally applies one or more preprocessing operations to the images. As discussed above, the preprocessing operation(s) may generally include any operations to facilitate or improve the machine learning process. For example, the measurement system may adjust the contrast and/or brightness of the images, resize the images, crop the images, and the like. In some embodiments, as discussed above, the measurement system may extract one or more features from the images (e.g., processing the image with a feature extraction machine learning model to generate one or more feature maps), as discussed above.
Example Method for Collecting User Image Data
[0095] FIGS. 6A and 6B illustrate a flow diagram depicting a method 600 for collecting user image data, according to some embodiments of the present disclosure. Specifically, FIG. 6A depicts a method 600A, while FIG. 6B depicts a method 600B, where the methods 600A and 600B collectively form a method 600. In some embodiments, the method 600 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-5. In some embodiments, the method 800 provides additional detail for block 405 of FIG. 4 and/or for the method 500 of FIG. 5
[0096] With reference now to FIG. 6A, at block 602, the measurement system outputs an instruction indicating that the user’s head should be within a defined frame. Generally, the measurement system may output this instruction using a variety of techniques, such as via a graphical user interface (GUI), via audio, and the like. For example, the measurement system may output textual instructions via the GUI, natural language speech audio indicating the instructions, and the like. In some aspects, the “frame” refers to a defined portion or region of the visual field of an imaging sensor being used to capture the user images (e.g., a smartphone camera). In some aspects, the measurement system may output real-time (or near-real-time) images captured by the imaging sensor via a display. For example, the measurement system may output the captured image(s) via a GUI of the user’s smartphone, instructing the user to position themselves and/or the camera such that their head is located within the frame indicated on the GUI.
[0097] In some aspects, prior to instructing the user to ensure their head is within the frame, the measurement system may perform various preprocessing operations, such as to evaluate one or more captured image(s) using machine learning to determine whether a single face is detected anywhere in the images (e.g., rather than multiple faces or no faces). In some aspects, the method 600 is performed while continuously or periodically capturing image(s) for evaluation using the various criteria discussed in more detail below.
[0098] At block 604, the measurement system determines whether the user’s head is in the desired frame in the captured image(s). In some aspects, the presence of the users head in a desired frame or location in the image(s) may be referred to as “frame criteria.” If not, the method 600 returns to block 602 to remind the user to place their head in the frame. If the user’s head is in the frame, the method 600 continues to block 606, where the measurement system determines whether one or more brightness criteria are satisfied by the captured or stream of images. For example, the measurement system may determine whether the image(s) are too light or bright (e.g., washed out by direct sunlight) and/or too dark or dim (e.g., obscured by shadows or a dark room). In some aspects, the desired level of brightness may be referred to as “brightness criteria.”
[0099] If the brightness criteria are not met, the method 600 continues to block 608, where the measurement system indicates the brightness criteria. For example, the measurement system may output an indication that the images are too dark or too bright (as relevant), and suggest or instruct the user move to an area with better lighting, avoid shadows, avoid bright direct light, and the like. The method 600 then returns to block 606.
[00100] If, at block 606, the measurement system determines that the brightness criteria are satisfied, the method 600 continues to block 610, where the measurement system determines whether the head of the user is depicted level in the image(s), relative to the imaging sensor. For example, the measurement system may determine whether the imaging sensor is being held level (or within a defined angle from level) with the user’s face (such that the images depict the head from straight on), below the user’s face (such that the images depict the head from below), and the like. In some aspects, this may be referred to as “level criteria.”
[00101] If the measurement system determines that the user’s head is not level with the camera, the method 600 continues to block 612, where the measurement system indicates the level criteria. For example, the measurement system may output an indication that the user
should position themselves and/or the imaging sensor such that the camera is level with the user’s face. The method 600 then returns to block 608.
[00102] If, at block 608, the measurement system determines that the level criteria are satisfied, the method 600 continues to block 614, where the measurement system determines whether the head of the user is within a defined distance, relative to the imaging sensor. For example, the measurement system may determine whether the estimated distance between the head and the imaging sensor is above a minimum threshold and/or below a maximum threshold (e.g., based on the relative size of the user’s head in the captured images). In some aspects, this may be referred to as “distance criteria.”
[00103] If the measurement system determines that the user’s head is too far or too close to the camera, the method 600 continues to block 616, where the measurement system indicates the distance criteria. For example, the measurement system may output an indication that the user should move themselves and/or the imaging sensor closer together or further apart such that the user’s face fills the indicated frame in the images. The method 600 then returns to block 614.
[00104] If, at block 614, the measurement system determines that the distance criteria are satisfied, the method 600 continues to block 618, where the measurement system captures one or more frontal images of the user’s face. In some aspects, if the measurement system has been continuously or repeatedly capturing images (e.g., to evaluate the frame criteria, the brightness criteria, the level criteria, and/or the distance criteria), “capturing” the frontal images at block 618 may correspond to saving, selecting, storing, or otherwise retaining or flagging one or more images (e.g., the images captured just before and/or after determining that block 614 was satisfied) for subsequent evaluation (e g., to predict facial measurements and/or select a user interface).
[00105] In some aspects, the measurement system may output an indication that the frontal image(s) have been captured (e.g., via a camera shutter noise, a flash of white or green light on the screen, and the like). In some aspects, the measurement system may refrain from providing such indication (e.g., proceeding directly to block 620, such that the user may not be aware of which image(s) are evaluated for the measurement process).
[00106] At block 620, the measurement system instructs the user to turn their heads towards the side (e.g., the right side or the left side, relative to the camera). For example, the
measurement system may instruct the user to turn their head slowly to the left or right. At block 622, the measurement system determines whether at least one of the user’s ears are visible in the most recently captured image (e.g., whether the full outline of the ear is visible, or whether the ear is obscured entirely or partially by hair, a hat, etc.). In some aspects, this may be referred to as “ear criteria.”
[00107] If the measurement system determines that the ear criteria are not satisfied, the method 600 continues to block 624, where the measurement system indicates the ear criteria. For example, the measurement system may output an indication that the user’s ear(s) should be visible (e.g., the ear on the opposite side of the head to which the user is turning), and instruct or ask the user to remove any impediments. The method 600 then returns to block 620.
[00108] If, at block 622, the measurement system determines that the ear(s) are visible, the method 600 continues via the block 626 to block 628, depicted in FIG. 6B. At block 628, the measurement system determines whether the user’s head is turned to at least a threshold angle relative to the imaging sensor (as depicted in the recently captured images). In some aspects, this is referred to as a “threshold angle criteria.” If the measurement system determines that the angle criteria are not met (e.g., the user’s head is not turned far enough to the side, or is turned too far to the side), the method 600 continues to block 630.
[00109] At block 630, the measurement system indicates the angle threshold. For example, the measurement system may output an indication that the user should turn their head further (e.g., away from the camera) or less (e.g., back towards the camera). The method 600 then returns to block 628.
[00110] If, at block 628, the measurement system determines that the threshold angle criteria are satisfied, the method 600 continues to block 632, where the measurement system captures one or more side images of the user’s face. In some aspects, if the measurement system has been continuously or repeatedly capturing images (e.g., to evaluate the frame criteria, the brightness criteria, the level criteria, the distance criteria, the ear criteria, the angle criteria, and the like), “capturing” the side images at block 632 may correspond to saving, selecting, storing, or otherwise retaining or flagging one or more images (e.g., the images captured just before and/or after determining that block 628 was satisfied) for subsequent evaluation (e.g., to predict facial measurements and/or select a user interface).
[00111] In some aspects, the measurement system may output an indication that the side image(s) have been captured (e g., via a camera shutter noise, a flash of white or green light on the screen, an audio statement indicating that the user turned their head far enough, and the like). In some aspects, the measurement system may refrain from providing such indication (e.g., proceeding directly to block 634, such that the user may not be aware of which image(s) are evaluated for the measurement process).
[00112] At block 634, the measurement system determines whether there is at least one additional side or angle for which the measurement system needs to capture image(s) for analysis. For example, if the user turned their head to the left, the measurement system may determine that the system still needs to capture image(s) of the user turning their head to the right (and vice versa). If the measurement system has not yet captured image(s) from the other side of the users head, the method 600 returns, via the block 636, to block 620 of FIG. 6A. In some aspects, if the measurement system relies on image(s) from a single side (e g., the left or the right) rather than both sides, block 634 may be bypassed.
[00113] If, at block 634, the measurement system determines that the measurement system has captured images from both side(s) of the user’s head (or from one side, if only one side is needed), the method 600 continues to block 638. At block 638, the measurement system instructs the user to tilt their head (e.g., upwards, relative to the camera).
[00114] At block 640, the measurement system determines whether the user’ s head is tilted to at least a threshold angle relative to the imaging sensor (as depicted in the recently captured images). In some aspects, this is referred to as a “tilt criteria” or simply as a “threshold angle criteria,” as discussed above. If the measurement system determines that the angle or tilt criteria are not met (e.g., the user’s head is not turned or tilted far enough up, or is turned or tilted too far upwards), the method 600 continues to block 642.
[00115] At block 642, the measurement system indicates the angle or tilt threshold. For example, the measurement system may output an indication that the user should turn their head further up (e.g., away from the camera) or not as far up (e.g., nearer to the camera). The method 600 then returns to block 638.
[00116] If, at block 640, the measurement system determines that the threshold tilt criteria are satisfied, the method 600 continues to block 644, where the measurement system captures one or more tilt images of the user’s face. In some aspects, if the measurement system has
been continuously or repeatedly capturing images (e.g., to evaluate the frame criteria, the brightness criteria, the level criteria, the distance criteria, the ear criteria, the angle criteria, and the like), “capturing” the tilt images at block 644 may correspond to saving, selecting, storing, or otherwise retaining or flagging one or more images (e.g., the images captured just before and/or after determining that block 640 was satisfied) for subsequent evaluation (e.g., to predict facial measurements and/or select a user interface).
[00117] Generally, using the method 600, the measurement system can capture any number of images for processing to predict or estimate the dimensions or measurements of the user’s face. In some aspects, each instruction or indication provided during the method 600 may generally be provided in a variety of modalities, including via textual output on the display, audio (e.g., spoken) output via one or more speakers, haptic output, and the like.
[00118] Further, although the illustrated example depicts a sequence of evaluations, in some aspects, some or all of the evaluations may be performed in partially or entirely in parallel. In some aspects, the measurement system continuously or periodically captures images while the method 600 is performed, such that each evaluation is performed on one or more of the most recently captured images. In some aspects, capturing or recording images for further evaluation (e.g., at blocks 618, 632, and 644) may correspond to saving the image(s) for future use, while the remaining images captured during the method 600 may be discarded.
[00119] In some aspects, determining whether or not one or more of the indicated criteria are satisfied may include evaluating the criteria for a sequence of images (e.g., at least five images in a row), over a period of time (e.g., whether the criteria are satisfied after five seconds), and the like. In some aspects, determining that the criteria are satisfied may be performed based on a single image (e.g., allowing the method 600 to proceed quickly to the next step) while determining that the criteria are not satisfied may be performed based on a set of multiple images and/or a minimum amount of time without the criteria being satisfied.
[00120] In some aspects, the measurement system may “capture” multiple images at each position (e.g., a sequence of images from the frontal view, from each side, and from the tilt angle). In some aspects, the measurement system may then select the best image(s) for downstream processing. For example, from each pool or set of images (e.g., for each angle), the measurement system may select the image having the least blur, the best lighting, and the like.
Example Workflow for Collecting User Image Data
[00121] FIG. 7 depicts an example workflow 700 for collecting user image data, according to some embodiments of the present disclosure. In some embodiments, the workflow 700 may be used by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-6. In some embodiments, the workflow 700 is used to facilitate capture of user images, as discussed above.
[00122] In some aspects, the workflow 700 is implemented by displaying user interfaces 705 (e.g., GUIs) via a display, such as on a smartphone, a laptop computer, a desktop computer, and the like. In some aspects, the workflow 700 is used to guide users to turn or tilt their heads to desired angles, as discussed above. Specifically, in the illustrated embodiment, the user interface 705Amay include a frame 710 and one or more directional indicators 715. The frame 710 may generally correspond to any visual indication of where the user should positon their head in the captured images.
[00123] For example, as discussed above, the measurement system may continuously capture images (e.g., a video or sequence of frames) and display the images on the user interface 705A, where the frame 710 may be superimposed over the stream and/or may occlude some or all of the images. As one example, the background of the user interface 705A may be opaque, and the frame 710 may be a window (e.g., an ellipse or other shape) showing a portion of the captured images. The user may be instructed to align their heads with the frame 710.
[00124] Generally, the directional indicators 715 may correspond to any visual indication of which direction the user should turn or tilt their head. For example, the illustrated user interface 705A depicts the directional indicators 715 pointing from the frame 710 to the left (e.g., asking the user to turn their head to the left). In other examples, the directional indicators 715 may be depicted pointing from the frame 710 to the right (e.g., asking the user to turn their head to the right) and/or pointing from the frame 710 upwards (e g., asking the user to tilt their head up).
[00125] Although the illustrated example depicts arrows as the directional indicators 715, the directional indicators 715 may generally take any form (including non-visual indicators, such as audio recordings indicating the direction). Also, illustrated in the user interface 705A, a threshold indicator 720 is depicted. The threshold indicator 720 may be used to visually indicate how far the user should turn their head in the indicated direction. For example, as
illustrated in the user interface 705B, a portion 725 of the directional indicators 715 is shaded (indicated by stippling) to indicate the amount that the user has turned their head, relative to the camera. That is, the measurement system may monitor the angle that the user’s head is facing relative to the camera, and may update the portion 725 to visually indicate the angle. For example, the portion 725 may start at a first size (e.g., small or invisible) when the user is facing directly towards the camera. As the user’s head angle increases relative to the camera, the measurement system may expand the portion 725 to indicate the amount that the user’s head has turned relative to the frontal angle.
[00126] As illustrated on the user interface 705B, the user has turned their head somewhat to the left, but has not yet turned their head far enough to satisfy the angle criteria (indicated by the threshold indicator 720). In some aspects, the interface is updated in real-time (or near- real-time) based on the determined angle of the user’s head. For example, the measurement system may dynamically change the size of the portion 725 to reflect the current angle.
[00127] Generally, the portions 725 may be implemented using any suitable technique. For example, the measurement system may cause the portion 725 to be a different color than the directional indicators 715 (e.g., where the indicators are white and the portion 725 fills in a color such as green). In some aspects, in addition to or instead of visually indicating the angle (via the portion 725), the measurement system may output other indications such as audio indications (e g., giving the current estimated angle, instructing the user to turn a bit further, and the like).
[00128] In the user interface 705C, the user has turned their head beyond the minimum threshold (indicated by the threshold indicator 720), as indicated by the shaded portion 725 of the directional indicators 715. In some aspects, as discussed above, the measurement system may capture one or more image(s) at this angle for subsequent evaluation. In the illustrated example, the background of the user interface 705C has been updated to reflect or indicate that the angle criteria have been satisfied. For example, the measurement system may cause the background to change color (e.g., to green), to flash one or more colors or graphics, and the like. Such indications, which may be more visible in the user’s peripheral vision (while their head is turned to the side) as compared to smaller indicators such as the portion 725, may help the user recognize that the image(s) have been captured and they can turn back toward the camera. In some aspects, in addition to or instead of changing the background of the interface,
the measurement system may use other indications such as audio output (e.g., a camera shutter sound effect or an oral statement that the images are captured) to assist the user.
Example Method for Improved Interface Selection
[00129] FIG. 8 is a flow diagram depicting a method 800 for improved interface selection, according to some embodiments of the present disclosure. In some embodiments, the method 800 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-7. In some embodiments, the method 800 provides additional detail for block 425 of FIG. 4.
[00130] At block 805, the measurement system selects a pillow size for the user by processing one or more nostril measurements using a trained machine learning model. For example, as discussed above, the measurement system (or another system) may train a classifier model to classify nostril measurements into pillow sizes. In some embodiments, as discussed above, the nostril measurements may include parameters such as the major and minor axes of an ellipse that corresponds to the nostril, the rotation of the ellipse or nostril, the distance between the ellipse or nostril and the centerline of the user’s nose, and the like. Using such user-specific measurements and machine learning can result in a pillow fitting that is far more comfortable and accurate (as well as far easier and more sanitary, as compared to a guess- and-check approach).
[00131] Generally, the particular operations used to train the classifier machine learning model may vary depending on the particular architecture. For example, in some embodiments, the measurement system (or other training system) may process the nostril measurements of a training sample as input to them model to generate a classification (e.g., to select a pillow size). The classification may then be compared against the label (e.g., the actual pillow size appropriate and/or comfortable for the user, based on their nostrils) to generate a loss. The loss may generally use a variety of formulations, such as cross-entropy loss. In some embodiments, the parameters of the model may then be updated (e.g., using backpropagation) based on the loss. In this way, the model learns to generate more accurate pillow size classifications based on input measurements. In embodiments, the model may be trained using individual samples (e.g., using stochastic gradient descent) and/or batches of samples (e.g., using batch gradient descent) over any number of iterations and/or epochs.
[00132] At block 810, the measurement system selects a conduit size based on one or more head measurements (e.g., measurements of the length of the conduit path from the user’s nose and/or mouth, up across the cheekbones, and over the user’s ears). For example, the measurement system may select a dynamic conduit size based on the measurement (e.g., selecting conduit that is the same length as the conduit path), or may select one of a predefined set of alternative conduit sizes (e.g., using a defined mapping between facial measurements and conduit size, such as the mapping 315 of FIG. 3). Using this user-specific measurement can result in a far improved conduit sizing, as compared to more generic or less accurate approaches.
[00133] At block 815, the measurement system selects a headgear size based on one or more head measurements (e.g., measurements of the occipitofrontal circumference of the user’s head). In some embodiments, to facilitate or improve headgear sizing, the measurement system may fit a statistical shape model of a human head to the mesh, such that the circumference of the head can be estimated (even if the back of the user’s head is not imaged). For example, the measurement system may select a dynamic headgear size based on the measurement (e g , indicating to use headgear that is the same size as the head circumference), or may select one of a predefined set of alternative headgear sizes (e.g., using a defined mapping between facial measurements and headgear size, such as the mapping 315 of FIG. 3). Using this user-specific measurement can result in a far improved headgear sizing, as compared to more generic or less accurate approaches.
[00134] At block 820, the measurement system selects a user interface (e.g., a nasal mask, a full-face mask, and/or a nasal pillow mask) based on one or more head or facial measurements (e.g., measurements of the nose and/or mouth of the user). For example, the measurement system may select one of a predefined set of alternative interfaces (e g., using a defined mapping between facial measurements and interfaces, such as the mapping 315 of FIG. 3). Using these user-specific measurements can result in a far improved interface fit, as compared to more generic or less accurate approaches.
[00135] In some embodiments, as discussed above, the measurement system may select multiple interface alternatives. For example, the measurement system may select one interface of each type (e.g., one nasal interface, one nasal pillow interface, and one full-face interfaces), or may select multiple alternatives within each type category. In some embodiments, the particular category (or categories) for which the measurement system generates a selection
may depend on user input. For example, the user may specify that they would like a recommended nasal mask. In some embodiments, the measurement system may generate the selection based on predicted user preference or fit (e.g., based on the facial measurements). For example, the measurement system may determine or infer, based on the facial measurements, that a particular type or category of interface will likely be the most comfortable for the user.
[00136] In some embodiments, as discussed above, the measurement system may select the interface component(s) based at least in part on user responses to various questions or surveys. For example, the measurement system may select an interface type based on responses related to the user’s prior interface usage (e g., if they already tried a pillow interface and did not like it, for example), based on the user’s comfort level with various types, based on the user’s tendency to breathe through their mouth or their nose, and the like. As another example, the measurement system may select the interface type based on the user’s response to a breathing exercise (e g., where the user is asked to breathe through their nose in synchronization with an animation), as discussed above. For example, the measurement system may select a full-face interface type for users who have difficulty breathing through their nose, and a nasal and/or pillow type for users who report potential claustrophobia with full face masks.
[00137] Generally, the particular component(s) selected by the measurement system may vary depending on the particular task and implementation. The illustrated examples (e.g., a pillow size, a conduit size, a headgear size, and an interface model) are depicted for conceptual clarity. In various embodiments, however, the measurement system may select additional components not pictured, or may select a subset of the illustrated components, for the user.
Example Method for Using Machine Learning to Select User Interfaces
[00138] FIG. 9 is a flow diagram depicting a method 900 for using machine learning to select user interfaces, according to some embodiments of the present disclosure. In some embodiments, the method 900 may be performed by a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-8.
[00139] At block 905, a set of two-dimensional images (e.g., the images 105 of FIGS. 1-2) of a user is accessed.
[00140] At block 910, a three-dimensional mesh (e.g., the mesh 130 of FIG. 1) depicting a head of the user is generated based on processing the set of two-dimensional images using a first machine learning model, wherein the three-dimensional mesh is scaled to a size of the head of the user.
[00141] At block 915, the three-dimensional mesh is modified to remove one or more facial expressions.
[00142] At block 920, a set of facial measurements (e.g., the measurements 140 of FIGS. 1 and/or 3) is determined based on the modified three-dimensional mesh.
[00143] At block 925, a user interface (e.g., the interface 150 of FIGS. 1 and/or 3) is selected for the user based on the set of facial measurements.
Example Computing Device for Mesh Generation and Interface Selection
[00144] FIG. 10 depicts an example computing device 1000 configured to perform various embodiments of the present disclosure. Although depicted as a physical device, in embodiments, the computing device 1000 may be implemented using virtual device(s), and/or across a number of devices (e.g., in a cloud environment). In one embodiment, the computing device 1000 corresponds to a measurement system, such as the measurement system 110 of FIG. 1 and/or the measurement systems discussed above with reference to FIGS. 2-9.
[00145] As illustrated, the computing device 1000 includes a CPU 1005, memory 1010, a network interface 1025, and one or more I/O interfaces 1020. In the illustrated embodiment, the CPU 1005 retrieves and executes programming instructions stored in memory 1010, as well as stores and retrieves application data residing in one or more storage repositories (not depicted). The CPU 1005 is generally representative of a single CPU and/or GPU, multiple CPUs and/or GPUs, a single CPU and/or GPU having multiple processing cores, and the like. The memory 1010 is generally included to be representative of a random access memory. In some embodiments, the computing device 1000 may include storage (not depicted) which may be any combination of disk drives, flash-based storage devices, and the like, and may include fixed and/or removable storage devices, such as fixed disk drives, removable memory cards, caches, optical storage, network attached storage (NAS), or storage area networks (SAN).
[00146] In some embodiments, I/O devices 1035 (such as keyboards, monitors, etc.) are connected via the I/O interface(s) 1020. Further, via the network interface 1025, the computing
device 1000 can be communicatively coupled with one or more other devices and components (e.g., via a network, which may include the Internet, local network(s), and the like). As illustrated, the CPU 1005, memory 1010, network interface(s) 1025, and I/O interface(s) 1020 are communicatively coupled by one or more buses 1030.
[00147] In the illustrated embodiment, the memory 1010 includes an image component 1050, a mesh component 1055, a measurement component 1060, and a selection component 1065, which may perform one or more embodiments discussed above. Although depicted as discrete components for conceptual clarity, in embodiments, the operations of the depicted components (and others not illustrated) may be combined or distributed across any number of components. Further, although depicted as software residing in memory 1010, in embodiments, the operations of the depicted components (and others not illustrated) may be implemented using hardware, software, or a combination of hardware and software.
[00148] In some embodiments, the image component 1050 (which may correspond to the image component 115 of FIGS. 1-2) may be used to access, evaluate, and/or preprocess images (e.g., the images 105 of FIGS. 1-2), as discussed above. For example, the image component 1050 may transmit or output instructions indicating how to capture the image(s), preprocess the image(s), and/or evaluate the image(s) to confirm that they meet acceptance criteria (e.g., whether an ear is visible in the profile image(s)).
[00149] In some embodiments, the mesh component 1055 (which may correspond to the mesh component 125 of FIG. 1) may be used to generate three-dimensional meshes (e.g., the mesh 130 of FIG. 1) based on two-dimensional images, as discussed above. For example, the mesh component 1055 may process the image(s) using one or more deep learning models (or other machine learning models) trained based on image data and corresponding point cloud (or other three-dimensional data, such as mesh data) for user faces.
[00150] In some embodiments, the measurement component 1060 (which may correspond to the measurement component 135 of FIG. 1) may be used to generate facial measurements (e.g., the measurements 140 of FIGS. 1 and/or 3), as discussed above. For example, the measurement component 1060 may collect measurements such defining the size, shape, positioning, and/or rotation of various facial landmarks or features, such as the mouth, the nose, the ears, the nostrils, and the like.
[00151] In some embodiments, the selection component 1065 (which may correspond to the selection component 145 of FIGS. 1 and/or 3) may be used to generate or select interface components (e.g., the interface 150 of FIGS. 1 and/or 3), as discussed above. For example, the selection component 1065 may use mappings (such as the mappings 1070) to identify appropriate component(s) based on the facial measurements, and/or may process one or more of the measurements (e.g., the nostril parameters) using one or more secondary machine learning models to predict or identify the best interface component.
[00152] In the illustrated example, the memory 1010 further includes mapping(s) 1070 and model parameter(s) 1075 for one or more machine learning models. In some embodiments, the mappings 1070 (which may correspond to the mappings 315 of FIG. 3) generally include mappings indicating, for one or more interface components, a set of facial measurements (e.g., a range of measurements) for which the component is acceptable or suitable. Alternatively, the mappings 1070 may indicate, for one or more ranges of facial measurements, a set of interface components that are acceptable or suitable. The model parameters 1075 may generally include parameters for any number of models, such as a mesh generation model (e.g., a deep learning mode used by the mesh component 1055 to generate facial meshes), a landmark or feature detection model (e.g., an ear detection model used by the image component 1050 to determine whether the profile images are acceptable), a component classifier model (e g., the nostril pillow classifier model discussed above, used by the selection component 1065 to select pillow sizing), and the like.
[00153] Although depicted as residing in memory 1010 for conceptual clarity, the mappings 1070 and model parameters 1075 may be stored in any suitable location, including one or more local storage repositories, or in one or more remote systems distinct from the computing device 1000.
Additional Considerations
[00154] The preceding description is provided to enable any person skilled in the art to practice the various embodiments described herein. The examples discussed herein are not limiting of the scope, applicability, or embodiments set forth in the claims. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. For example, changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures
or components as appropriate. For instance, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Also, features described with respect to some examples may be combined in some other examples. For example, an apparatus may be implemented or a method may be practiced using any number of the embodiments set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various embodiments of the disclosure set forth herein. It should be understood that any embodiment of the disclosure disclosed herein may be embodied by one or more elements of a claim.
[00155] As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
[00156] As used herein, a phrase referring to “at least one of’ a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
[00157] As used herein, the term “determining” encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory) and the like. Also, “determining” may include resolving, selecting, choosing, establishing and the like.
[00158] The methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. Further, the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific
integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.
[00159] Embodiments of the invention may be provided to end users through a cloud computing infrastructure. Cloud computing generally refers to the provision of scalable computing resources as a service over a network. More formally, cloud computing may be defined as a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Thus, cloud computing allows a user to access virtual computing resources (e.g., storage, data, applications, and even complete virtualized computing systems) in “the cloud,” without regard for the underlying physical systems (or locations of those systems) used to provide the computing resources.
[00160] Typically, cloud computing resources are provided to a user on a pay-per-use basis, where users are charged only for the computing resources actually used (e.g. an amount of storage space consumed by a user or a number of virtualized systems instantiated by the user). A user can access any of the resources that reside in the cloud at any time, and from anywhere across the Internet. In context of the present invention, a user may access applications or systems (e.g., measurement system 110 of FIG. 1) or related data available in the cloud. For example, the measurement system could execute on a computing system in the cloud and train/use machine learning models to generate facial meshes and select interface components. In such a case, the measurement system could maintain the models in the cloud, and use them to drive improved interface recommendations. Doing so allows a user to access this information from any computing system attached to a network connected to the cloud (e.g., the Internet).
[00161] The following claims are not intended to be limited to the embodiments shown herein, but are to be accorded the full scope consistent with the language of the claims. Within a claim, reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. No claim element is to be construed under the provisions of 35 U.S. C. §112(f) unless the element is expressly recited using the phrase “means
for” or, in the case of a method claim, the element is recited using the phrase “step for.” All structural and functional equivalents to the elements of the various embodiments described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.
Example Clauses
[00162] Implementation examples are described in the following numbered clauses:
[00163] Clause 1: A method, comprising: accessing a set of two-dimensional images of a user; generating, based on processing the set of two-dimensional images using a first machine learning model, a three-dimensional mesh depicting a head of the user, wherein the three- dimensional mesh is scaled to a size of the head of the user; modifying the three-dimensional mesh to remove one or more facial expressions; determining a set of facial measurements based on the modified three-dimensional mesh; and selecting a user interface for the user based on the set of facial measurements.
[00164] Clause 2: The method of Clause 1, wherein selecting the user interface comprises generating a recommended pillow size for the user interface based on a set of nostril measurements of the set of facial measurements.
[00165] Clause 3: The method of Clause 2, wherein the set of nostril measurements define at least a first ellipse and comprise at least one of: (i) a major axis, (ii) a minor axis, (iii) a rotation, or (iv) a distance of the first ellipse from a center of a nose of the user.
[00166] Clause 4: The method of any of Clauses 2-3, wherein generating the recommended pillow size comprises processing the set of nostril measurements using a second machine learning model.
[00167] Clause 5: The method of any of Clauses 1-4, wherein selecting the user interface comprises generating a recommended conduit size for the user interface based on the set of facial measurements.
[00168] Clause 6: The method of any of Clauses 1-5, wherein selecting the user interface comprises generating a recommended headgear size for the user interface based on the set of facial measurements.
[00169] Clause 7: The method of Clause 6, wherein generating the recommended headgear size comprises fitting a statistical shape model of a human head to the three-dimensional mesh.
[00170] Clause 8: The method of any of Clauses 1-7, wherein, prior to generating the three- dimensional mesh, at least one two-dimensional image of the set of two-dimensional images was processed using a second machine learning model to detect presence of an ear of the user in the at least one two-dimensional image.
[00171] Clause 9: The method of any of Clauses 1-8, wherein the first machine learning model was trained based on a set of training images depicting a training user and a corresponding set of three-dimensional data points for a head of the training user.
[00172] Clause 10: The method of Clause 9, wherein the first machine learning model does not use a camera model to generate the three-dimensional mesh.
[00173] Clause 11 : The method of any of Clauses 1-10, wherein the set of two-dimensional images comprise an image depicting a left side of the head of the user, an image depicting a right side of the head of the user, an image depicting a front of the head of the user, and an image depicting a bottom of the head of the user.
[00174] Clause 12: The method of any of Clauses 1-11, further comprising, after selecting the user interface, deleting the set of two-dimensional images, the three-dimensional mesh, and the set of facial measurements.
[00175] Clause 13: The method of any of Clauses 1-12, further comprising: providing one or more requests for information to the user, wherein the one or more requests for information ask the user to indicate whether they experience difficulty breathing through their nose receiving, from the user, one or more responses to the one or more requests; and selecting the user interface based further on the one or more responses.
[00176] Clause 14: The method of any of Clauses 1-13, wherein accessing the set of two- dimensional images comprises: capturing an image depicting the user; in response to determining that the first image satisfies at least one of (i) a frame criteria, (ii) a brightness
criteria, (iii) a level criteria, or (iv) a distance criteria, outputting an instruction to the user to turn the head of the user; capturing a second image depicting the user; and in response to determining that the second image satisfies a threshold angle criteria, adding the second image to the set of two-dimensional images.
[00177] Clause 15: The method of Clause 14, wherein accessing the set of two-dimensional images further comprises, prior to capturing the second image: capturing a third image depicting the user; and in response to determining that the third image does not satisfy the threshold angle criteria, instructing the user to turn the head of the user further.
[00178] Clause 16: The method of any of Clauses 14-15, wherein accessing the set of two- dimensional images further comprises, prior to capturing the second image: capturing a third image depicting the user; and in response to determining that the third image does not depict at least one ear of the user, instructing the user to make the ear of the user visible.
[00179] Clause 17: The method of any of Clauses 14-16, wherein accessing the set of two- dimensional images further comprises, prior to capturing the second image: monitoring an angle to which the head of the user is turned; and outputting a visual indication of the angle relative to the threshold angle criteria.
[00180] Clause 18: The method of any of Clauses 14-17, further comprising, in response to determining that the second image satisfies the threshold angle criteria, outputting a visual indication that the threshold angle criteria are satisfied.
[00181] Clause 19: The method of any of Clauses 14-18, further comprising, in response to determining that the second image satisfies the threshold angle criteria, outputting an audio indication that the threshold angle criteria are satisfied.
[00182] Clause 20: A system, comprising: a memory comprising computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to perform a method in accordance with any one of Clauses 1-19.
[00183] Clause 21 : A system, comprising means for performing a method in accordance with any one of Clauses 1-19.
[00184] Clause 22: A non-transitory computer-readable medium comprising computerexecutable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform a method in accordance with any one of Clauses 1-19.
[00185] Clause 23 : A computer program product embodied on a computer-readable storage medium comprising code for performing a method in accordance with any one of Clauses 1- 19.
Claims
1. A method, comprising: accessing a set of two-dimensional images of a user; generating, based on processing the set of two-dimensional images using a first machine learning model, a three-dimensional mesh depicting a head of the user, wherein the three-dimensional mesh is scaled to a size of the head of the user; modifying the three-dimensional mesh to remove one or more facial expressions; determining a set of facial measurements based on the modified three-dimensional mesh; and selecting a user interface for the user based on the set of facial measurements.
2. The method of claim 1, wherein selecting the user interface comprises generating a recommended pillow size for the user interface based on a set of nostril measurements of the set of facial measurements.
3. The method of claim 2, wherein the set of nostril measurements define at least a first ellipse and comprise at least one of: (i) a major axis, (ii) a minor axis, (iii) a rotation, or (iv) a distance of the first ellipse from a center of a nose of the user.
4. The method of claim 2, wherein generating the recommended pillow size comprises processing the set of nostril measurements using a second machine learning model.
5. The method of claim 1, wherein selecting the user interface comprises generating a recommended conduit size for the user interface based on the set of facial measurements.
6. The method of claim 1, wherein selecting the user interface comprises generating a recommended headgear size for the user interface based on the set of facial measurements.
7. The method of claim 6, wherein generating the recommended headgear size comprises fitting a statistical shape model of a human head to the three-dimensional mesh.
8. The method of claim 7, wherein, prior to generating the three-dimensional mesh, at least one two-dimensional image of the set of two-dimensional images was processed using a second machine learning model to detect presence of an ear of the user in the at least one two-dimensional image.
9. The method of claim 1, wherein the first machine learning model was trained based on a set of training images depicting a training user and a corresponding set of three- dimensional data points for a head of the training user.
10. The method of claim 9, wherein the first machine learning model does not use a camera model to generate the three-dimensional mesh.
11. The method of claim 1, wherein the set of two-dimensional images comprise an image depicting a left side of the head of the user, an image depicting a right side of the head of the user, an image depicting a front of the head of the user, and an image depicting a bottom of the head of the user.
12. The method of claim 1, further comprising, after selecting the user interface, deleting the set of two-dimensional images, the three-dimensional mesh, and the set of facial measurements.
13. The method of claim 1, further comprising: providing one or more requests for information to the user, wherein the one or more requests for information ask the user to indicate whether they experience difficulty breathing through their nose; receiving, from the user, one or more responses to the one or more requests; and selecting the user interface based further on the one or more responses.
14. The method of claim 1, wherein accessing the set of two-dimensional images comprises: capturing an image depicting the user; in response to determining that the first image satisfies at least one of (i) a frame criteria, (ii) a brightness criteria, (iii) a level criteria, or (iv) a distance criteria, outputting an instruction to the user to turn the head of the user;
capturing a second image depicting the user; and in response to determining that the second image satisfies a threshold angle criteria, adding the second image to the set of two-dimensional images.
15. The method of claim 14, wherein accessing the set of two-dimensional images further comprises, prior to capturing the second image: capturing a third image depicting the user; and in response to determining that the third image does not satisfy the threshold angle criteria, instructing the user to turn the head of the user further.
16. The method of claim 14, wherein accessing the set of two-dimensional images further comprises, prior to capturing the second image: capturing a third image depicting the user; and in response to determining that the third image does not depict at least one ear of the user, instructing the user to make the ear of the user visible.
17. The method of claim 14, wherein accessing the set of two-dimensional images further comprises, prior to capturing the second image: monitoring an angle to which the head of the user is turned; and outputting a visual indication of the angle relative to the threshold angle criteria.
18. The method of claim 17, further comprising, in response to determining that the second image satisfies the threshold angle criteria, outputting a visual indication that the threshold angle criteria are satisfied.
19. The method of claim 18, further comprising, in response to determining that the second image satisfies the threshold angle criteria, outputting an audio indication that the threshold angle criteria are satisfied.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463565412P | 2024-03-14 | 2024-03-14 | |
| US63/565,412 | 2024-03-14 | ||
| US202463688394P | 2024-08-29 | 2024-08-29 | |
| US63/688,394 | 2024-08-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025189245A1 true WO2025189245A1 (en) | 2025-09-18 |
Family
ID=97062515
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/AU2025/050236 Pending WO2025189245A1 (en) | 2024-03-14 | 2025-03-14 | Machine learning for three-dimensional mesh generation based on images |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025189245A1 (en) |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8874251B2 (en) * | 2006-07-06 | 2014-10-28 | Airway Technologies, Llc | System and method for forming a custom medical mask from a three-dimensional electronic model |
| WO2022183116A1 (en) * | 2021-02-26 | 2022-09-01 | Resmed Inc. | System and method for continuous adjustment of personalized mask shape |
| US20220351467A1 (en) * | 2021-05-03 | 2022-11-03 | Ditto Technologies, Inc. | Generation of a 3d model of a reference object to perform scaling of a model of a user's head |
| US20230056800A1 (en) * | 2021-08-23 | 2023-02-23 | Sony Group Corporation | Shape refinement of three-dimensional (3d) mesh reconstructed from images |
| WO2023049929A1 (en) * | 2021-09-27 | 2023-03-30 | Resmed Digital Health Inc. | Machine learning to determine facial measurements via captured images |
| US20230260184A1 (en) * | 2022-02-17 | 2023-08-17 | Zoom Video Communications, Inc. | Facial expression identification and retargeting to an avatar |
| US20230364365A1 (en) * | 2022-05-10 | 2023-11-16 | ResMed Pty Ltd | Systems and methods for user interface comfort evaluation |
| US11887320B1 (en) * | 2020-03-26 | 2024-01-30 | Oceanit Laboratories, Inc. | System and method for producing custom fitted face masks and applications thereof |
-
2025
- 2025-03-14 WO PCT/AU2025/050236 patent/WO2025189245A1/en active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8874251B2 (en) * | 2006-07-06 | 2014-10-28 | Airway Technologies, Llc | System and method for forming a custom medical mask from a three-dimensional electronic model |
| US11887320B1 (en) * | 2020-03-26 | 2024-01-30 | Oceanit Laboratories, Inc. | System and method for producing custom fitted face masks and applications thereof |
| WO2022183116A1 (en) * | 2021-02-26 | 2022-09-01 | Resmed Inc. | System and method for continuous adjustment of personalized mask shape |
| US20220351467A1 (en) * | 2021-05-03 | 2022-11-03 | Ditto Technologies, Inc. | Generation of a 3d model of a reference object to perform scaling of a model of a user's head |
| US20230056800A1 (en) * | 2021-08-23 | 2023-02-23 | Sony Group Corporation | Shape refinement of three-dimensional (3d) mesh reconstructed from images |
| WO2023049929A1 (en) * | 2021-09-27 | 2023-03-30 | Resmed Digital Health Inc. | Machine learning to determine facial measurements via captured images |
| US20230260184A1 (en) * | 2022-02-17 | 2023-08-17 | Zoom Video Communications, Inc. | Facial expression identification and retargeting to an avatar |
| US20230364365A1 (en) * | 2022-05-10 | 2023-11-16 | ResMed Pty Ltd | Systems and methods for user interface comfort evaluation |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9400923B2 (en) | Patient interface identification system | |
| CN109740491B (en) | A human eye sight recognition method, device, system and storage medium | |
| CN105392423B (en) | Motion Tracking System for Real-Time Adaptive Motion Compensation in Biomedical Imaging | |
| EP4134981A1 (en) | Method for acquiring side image for eye protrusion analysis, image capture device for performing same, and recording medium | |
| JP2015533519A5 (en) | ||
| JP2018538593A (en) | Head mounted display with facial expression detection function | |
| KR100986101B1 (en) | Method and apparatus for providing face analysis service | |
| US12190634B2 (en) | Machine learning to determine facial measurements via captured images | |
| JP2023553957A (en) | System and method for determining sleep analysis based on body images | |
| US20250261854A1 (en) | Method and photographing device for acquiring side image for ocular proptosis degree analysis, and recording medium therefor | |
| JP2024543707A (en) | Method, apparatus, terminal device, and readable storage medium for virtually wearing a mask | |
| US20250292510A1 (en) | Machine learning for three-dimensional mesh generation based on images | |
| WO2025189245A1 (en) | Machine learning for three-dimensional mesh generation based on images | |
| US20230284968A1 (en) | System and method for automatic personalized assessment of human body surface conditions | |
| KR102477699B1 (en) | A method, an imaging device and a recording medium for acquiring a lateral image for eye protrusion analysis | |
| TWI731447B (en) | Beauty promotion device, beauty promotion system, beauty promotion method, and beauty promotion program | |
| KR102883500B1 (en) | Method for evaluating facial contour and degree of skin sagging by analyzing facial images and apparatus thereof | |
| US20240216632A1 (en) | Apparatus and method for selecting positive airway pressure mask interface | |
| CN119365891A (en) | Systems, methods and devices for static and dynamic analysis of the face and oral cavity | |
| CN116052045A (en) | A non-contact emotional state detection method and system with dual video channels | |
| WO2022173055A1 (en) | Skeleton estimating method, device, program, system, trained model generating method, and trained model | |
| CN117333588A (en) | User representation using depth relative to multiple surface points |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25769060 Country of ref document: EP Kind code of ref document: A1 |