[go: up one dir, main page]

WO2025029914A1 - Identification multimodale d'état médical - Google Patents

Identification multimodale d'état médical Download PDF

Info

Publication number
WO2025029914A1
WO2025029914A1 PCT/US2024/040376 US2024040376W WO2025029914A1 WO 2025029914 A1 WO2025029914 A1 WO 2025029914A1 US 2024040376 W US2024040376 W US 2024040376W WO 2025029914 A1 WO2025029914 A1 WO 2025029914A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
subject
obtaining
medical
engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/040376
Other languages
English (en)
Inventor
Narayanan RAMANATHAN
Ikshvanku BAROT
Jatin Maniar
Hitesh Kalra
Meir KRYGER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Soliish Inc
Original Assignee
Soliish Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Soliish Inc filed Critical Soliish Inc
Publication of WO2025029914A1 publication Critical patent/WO2025029914A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30036Dental; Teeth
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • This specification describes technologies for improving medical condition determination, e.g., through screening or diagnosis, by using one or more machine learning models, trained using multi-modal data, to predict the presence of a medical condition.
  • Such techniques can reduce equipment and expense incurred by traditional diagnostic methods while providing accurate medical results.
  • the described techniques can facilitate low-cost, non-invasive and easy to administer medical diagnostics.
  • Medical diagnostics can include predictions of medical conditions, such as obstructive sleep apnea, hypertension, risk of stroke, heart attack, or a combination of these, among other medical conditions.
  • one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of obtaining visual data representing at least one body part of a subject; obtaining non-visual data that corresponds to one or more biological characteristics of the subject; providing data representing (i) the visual data and (ii) the non-visual data to one or more machine learning models, wherein the one or more machine learning models are trained to predict presence of a medical condition; obtaining an output of the one or more machine learning models that is generated based on the one or more machine learning models processing the visual data and the non-visual data; and determining presence of the medical condition for the subject using the output of the one or more machine learning models.
  • Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Feature 1 Obtaining the visual data representing the at least one body part of the subject comprises: obtaining an image representing the face of the subject.
  • Feature 2 Obtaining the visual data representing the at least one body part of the subject comprises: obtaining a first image representing a first body part of the subject from a first view perspective; and obtaining a second image representing a second body part of the subject from a second view perspective.
  • Feature 3 The first body part is the same as the second body part.
  • Feature 4 The first body part is different from the second body part.
  • Feature 5 Obtaining the visual data representing the at least one body part of the subject comprises: capturing image data from a front facing camera of a smartphone during a relative motion between the smartphone and the at least one body part of the subject — e.g., where relative motion can include a smartphone moving in a substantially circular motion, a smartphone remaining stationary and a subject moving, or a smartphone moving and a subject remaining stationary.
  • Feature 6 Determining the presence of the medical condition comprises: determining the presence of a sleep apnea condition.
  • Feature 7 Determining the presence of a sleep apnea condition comprises: determining the presence of an obstructive sleep apnea condition.
  • Feature 8 Determining the presence of the medical condition comprises: determining the presence of at least one of a hypertension condition or cardiovascular disease.
  • Feature 9 Obtaining the visual data representing the at least one body part of the subject comprises: obtaining visual data representing an oral cavity of the subject.
  • Feature 10 Obtaining the visual data representing the at least one body part of the subject comprises: obtaining visual data representing a craniofacial region.
  • Feature 11 Obtaining the visual data representing the at least one body part of the subject comprises: obtaining visual data representing an anterior cervical region.
  • Feature 12 Obtaining the visual data representing the at least one body part of the subject comprises: obtaining a first image of a front of a hand of the subject; and obtaining a second image of a back of the hand of the subject.
  • Actions include providing a cue for obtaining the visual data.
  • Feature 14: Providing the cue for obtaining the visual data comprises: providing one or more of an audio cue or a visual cue.
  • Feature 15: Providing the audio cue comprises: activating a speaker to provide a language- based cue.
  • Feature 16 Providing the cue for obtaining the visual data comprises: providing an outline on a display of a body part corresponding to the at least one body part of the subject.
  • Actions include generating a first embedding representing at least a portion of the visual data; generating a second embedding representing at least a portion of the non-visual data; and generating a fused data set that combines the first embedding and the second embedding, wherein providing the data representing (i) the visual data and (ii) the non-visual data to the one or more machine learning models comprises: providing the fused data set to the one or more machine learning models.
  • Feature 20 Providing the data to the one or more machine learning models comprises: providing the data representing the visual data to a convolutional neural network (CNN); and providing the data representing the non-visual data to a recurrent neural network (RNN).
  • CNN convolutional neural network
  • RNN recurrent neural network
  • Feature 21 Actions include in response to determining presence of the medical condition for the subject, generating recommendation data for one or more therapeutic interventions; and transmitting the recommendation data to an electronic device.
  • Feature 22 Obtaining the visual data representing the at least one body part of the subject comprises: obtaining an image of a particular body part; and extracting one or more features from the image of the particular body part, wherein the visual data includes the extracted one or more features.
  • Feature 23 Extracting the one or more features from the image of the particular body part comprises: extracting points representing a face of the subject, and wherein obtaining the visual data comprises: generating, using the extracted points, a face model of the subject.
  • Feature 24 Obtaining the non-visual data comprises: obtaining audio data of the subject speaking.
  • Feature 25 Obtaining the audio data comprises: obtaining first audio of the subject speaking; and obtaining second audio of the subject speaking with a speech restriction.
  • Feature 26 Obtaining the second audio of the subject speaking with the speech restriction comprises: obtaining audio while a nasal passage of the subject is restricted.
  • Actions include training the one or more machine learning models using (i) a set of training data representing a number of subjects and (ii) a set of ground truths representing known medical conditions for each of the number of subjects.
  • Actions include generating the set of ground truths, wherein generating the set of ground truths comprises: obtaining a set of medical reports that include data indicating the medical conditions for each of the number of subjects; and providing the set of medical reports to a trained language model.
  • Actions include generating an embedding for at least a portion of the set of medical reports; and storing the embedding for at least the portion of the set of medical reports in memory, wherein obtaining the set of medical reports that include data indicating the medical conditions for each of the number of subjects comprises: querying the memory for the embedding representing at least the portion of the set of medical reports.
  • another innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving an input selecting a medical condition from a plurality of medical conditions available for diagnosis; in response to receiving the input, accessing stored data corresponding to the plurality of medical conditions and a plurality of machine learning models that are configured to determine presence of the plurality of medical conditions; obtaining from the stored data, using the received input, data corresponding to the selected medical condition and at least one trained machine learning model that is configured to determine presence of the selected medical condition; providing medical data representing a subject to the at least one trained machine learning model; and determining presence of the selected medical condition for the subject using output of the at least one trained machine learning model, wherein the output of the at least one trained machine learning model is generated based on the at least one trained machine learning model processing the medical data.
  • Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Actions include receiving a second input selecting a different second medical condition from the plurality of medical conditions available for diagnosis; in response to receiving the second input, obtaining from the stored data using the second input, second data corresponding to the second medical condition and a trained second machine learning model that is configured to determine presence of the second medical condition, wherein at least one of the second data or the trained second machine learning model is respectively different from the data corresponding to the selected medical condition or the at least one trained machine learning model configured to determine presence of the selected medical condition; providing second medical data representing a second subject to the trained second machine learning model; and determining presence of the second medical condition for the second subject using output of the trained second machine learning model, wherein the output of the trained second machine learning model is generated based on the trained second machine learning model processing the second medical data.
  • Feature 2 The first subject and the second subject are the same subject.
  • the technology described in this specification can be implemented so as to realize one or more of the following advantages.
  • the described techniques reduce the expense and equipment typically incurred for medical condition diagnosis, such as sleep apnea diagnosis.
  • subjects can provide data, using techniques described, to one or more trained machine learning models.
  • the machine learning models can be trained to identify one or more medical conditions using the provided data.
  • FIG. 1 shows an example medical condition determination system.
  • FIG. 2 shows another example medication condition determination system.
  • FIG. 3 shows example processes of data capture for a medication condition determination system.
  • FIG. 4 shows an example system for processing medical reports.
  • FIG. 5 shows additional elements of an example condition identifier engine.
  • FIG. 6 shows an example subgraph process for risk probability predictions.
  • FIG. 7 is a flowchart of an example process for medical condition identification.
  • FIG. 8 is a flowchart of an example process for a medical condition identification platform.
  • This specification describes techniques for improving medical screening, diagnosis and support by using one or more machine learning models, trained using multi-modal data, to predict the presence of a medical condition. Such techniques can reduce equipment and expense incurred by traditional diagnostic methods while providing accurate medical results. The described techniques can facilitate low-cost, non-invasive and easy to administer medical diagnostics. Medical screening and/or diagnostics can include predictions of medical conditions, recommendations for therapeutic pathways, or both. The medical conditions include obstructive sleep apnea, hypertension, pulmonary hypertension, risk of stroke, cardiovascular disease, Type 2 Diabetes, depression, anxiety, or a combination of these among others.
  • the techniques can be used to predict the likelihood of a person suffering from sleep apnea, such as OSA.
  • Obstructive Sleep Apnea (OSA), a condition characterized by the partial or complete obstruction of the upper airway during sleep, when left untreated is known to have a significant impact on cardiovascular health, heart disease, leading to hypertension, irregular heart rhythms and increased risk of strokes.
  • Intermittent hypoxia and sleep fragmentation, both associated with OSA have been shown to disrupt glucose metabolism thereby increasing the risk of developing type-2 diabetes, impairing cognitive performance, increasing risks of depression and decreasing overall quality of life.
  • OSA While generally prevalent in middle-aged and older adults, among men, among obese subjects, and increasingly among certain ethnic groups (African American, Mexican, South Asian), OSA has also been observed in children and in pregnant women. Screening for OSA enables detection and early intervention through therapy, lifestyle modifications, and surgeries.
  • OSA can be a debilitating sleep disorder that affects an estimated 26% of US adults between the ages of 30 and 70.
  • OSA is a heterogenous condition that can emerge from numerous comorbidities and anatomical variations. Rates of OSA prevalence, diagnosis, and care are not homogenous across patient demographics. While several of the strongest OSA risk factors have an established genetic basis, notable risk factors can include environmental or social exposures and demographic classifications (e.g., sex, age, gender, race or ethnicity).
  • OSA prevalence can vary widely across patient demographics, e.g., as children, African Americans are 4-6 times more likely to suffer from OSA than white patients; observed OSA severity is more than 20% greater in African Americans; and elderly African Americans are more than 2.1 times more likely to have severe OSA compared to white patients of a similar age.
  • Other groups disproportionately affected by OSA include Southeast Asians, Hispanics, and Native Americans.
  • OSA epidemiology patients of disproportionately affected demographics are generally less likely to have access to OSA care than others, e.g., white counterparts.
  • HSAT Home Sleep Apnea Testing
  • the disclosed techniques can include gathering multimodal data from subjects.
  • Multimodal data can include imagery and video capturing the subjects’ cervico-cranio-facial complex from different viewpoints, voice samples that help assess their tonal quality while uttering a set of pre-selected phrases, or sleep questionnaires that gather data on their sleep quality and medical history, or a combination of these among others.
  • Multimodal data can include oral cavity imagery, such as images representing lower teeth, upper teeth, upper palate, tongue, underside of the tongue, posterior pharyngeal wall, or a combination of these among others.
  • Multimodal data can include generated content from applications, data captured from wearable devices, such as watches, rings, or other wearables, or data from testing equipment, such as SPO2 sensors.
  • techniques include capturing imagery and video gathered from within the subject’s oral cavity.
  • Factors such as recessed jaw, facial asymmetry, maxillary insufficiency, mandibular insufficiency, increased neck circumference, increased tendency towards oral breathing can indicate constrictions in the upper airway anatomy that contribute to OSA.
  • Factors can include other data, such as data indicating enlarged tonsils, enlarged tongue, crowding of the teeth, high arched palate, bruxism, or a combination of these, among others.
  • techniques include learning models —such as supervised deep learning models or other types of machine learning models — that process multimodal data.
  • the learning models can leverage the power of fusion in convolutional neural networks to combine and process multimodal data (e.g., as and when available) for effective detection of a medical condition and characterization of the onset of the medical condition.
  • a system can facilitate multimodal data collection.
  • a system can include software to capture data (e.g., text, voice, image, video) of subjects.
  • the system can use the data to help screen subjects for one or more medical conditions, such as OSA, high blood pressure, diabetes, heart disease, abnormal heart rhythm, stroke, or a combination of these among others.
  • one or more parameters of the system can be adjusted by a user or system administrator to select one or more different medical conditions.
  • the system can accordingly be configured to collect and analyze data corresponding to the selected medical condition(s).
  • learning-based models can be developed based on data gathered by the system and ground truth labels.
  • ground truth labels are provided by a physician. Additionally or alternatively, in some implementations, ground truth labels are retrieved from electronic health records, or. obtained from polysomnography test results or other test output. In some implementations, ground truth labels are generated by one or more computer algorithms.
  • the system can operate as a screening application. The system can perform data collection, e.g., before, during, or after, a screening operation.
  • FIG. 1 shows an example medication condition determination system 100.
  • the system 100 includes a data capture engine 104 and a condition identifier engine 108.
  • the system 100 captures multi-modal data, such as multi-modal data 106, and generates an identified medical condition 110.
  • the multi-modal data 106 can include image data, questionnaire data, audio data, or a combination of these among others.
  • the multimodal data 106 can include generated data, such as data from applications, questionnaires, or wearable sensors.
  • the identified medical condition 110 can include a score indicating a likelihood that a subject has a particular medical condition, such as medical condition 102.
  • the data capture engine 104 captures data in a sequence.
  • the data capture engine 104 can capture a first item of data and then, in response to successfully capturing the first item of data, capture a second item of data.
  • the data capture engine 104 can capture data in a sequence to generate the multi-modal data 106.
  • the data capture engine 104 can obtain questionnaire data, e.g., by prompting a user for audio or textual responses to one or more questions.
  • the data capture engine 104 can obtain one or more scans of a subject’s body, e.g., images or video of a subject’s head. Images or video of a subject’s head can include images of a subject’s oral cavity.
  • the data capture engine 104 can obtain audio of a subject, e.g., an audio recorded using a microphone of the data capture engine 104 of a subject speaking a word or phrase.
  • a sequence of body scans can include one or more of: a scan of a subject’s head looking at the camera, a left side profile, a right side profile, or an upward looking pose.
  • An upward looking pose can include a scan inside the mouth to intramural cavity assessment.
  • the data capture engine 104 can capture images of a subject’s body in any number of different poses. Each body pose can be captured as image or video data and can be included in the multi-modal data 106 for processing.
  • the data capture engine 104 captures data in alternative sequences.
  • the data capture engine 104 can capture audio recordings, followed by images, followed by questionnaire data.
  • the data capture engine 104 can capture a single modality of data and generate the multi-modal data 106 from a single model of data, e.g., a set of visual images or video taken of a subject’s body.
  • the data capture engine 104 can capture different data, e.g., different sequences of data capture or different modes of data, for different medical conditions to be identified.
  • the data capture engine 104 captures data without instigation or involvement by a user, (e.g., a subject or a medical professional).
  • the data capture engine 104 can obtain data from electronic health records or other medical data from another system.
  • the data capture engine 104 invokes one or more application programming interface (API) service calls to obtain data.
  • API application programming interface
  • the data capture engine 104 can perform one or more pre-screening operations.
  • the data capture engine 104 can obtain one or more health records, e.g., from a dataset of one or more health records, and parse the data, e.g., to identify likely candidates with a risk of a medical condition, such as OSA.
  • the system 100 can then generate a notification or interface to engage with identified persons at risk for a medical condition, e.g., based medical data, such as images, medical history, or a combination of these among others.
  • the data capture engine 104 obtains an indication of a medical condition to be identified.
  • the data capture engine 104 can obtain the medical condition 102.
  • the data capture engine 104 can be configured to capture data for multiple different medical conditions.
  • the data capture engine 104 can be configured to adjust a process of data capture depending on the medication condition obtained. For example, if the medical condition 102 is sleep apnea, the data capture engine 104 can be configured to capture data indicative of sleep apnea. If the medical condition 102 is cardiovascular disease, the data capture engine 104 can be configured to capture data indicative of cardiovascular disease.
  • the data capture engine 104 can capture any type of data that is indicative of a medical condition, such as the medical condition 102.
  • the data capture engine 104 includes guided data collection.
  • the data capture engine 104 can be configured to activate speakers to play a voice that instructs a subject to obtain data.
  • the data capture engine 104 can activate speakers to play a recording or computer-generated voice that says, “face the camera and position your head within the outline displayed on the screen,” e.g., for a case where a device for data capture includes a screen and a camera, such as a smartphone, laptop, or desktop computer with connected peripheral devices.
  • the data capture engine 104 can be configured to display visual indicators to help a subject obtain data.
  • the data capture engine 104 can be configured to display an outline of a head on a screen facing a subject that is also showing a live feed of the subject.
  • the display can be the display of a smartphone and the camera capturing video of the subject can be a front facing camera of the smartphone.
  • the data capture engine 104 activates audio or visual cues to guide a subject to have a particular facial expression, such as a neutral facial expression.
  • the data capture engine 104 includes a head pose estimator process.
  • the data capture engine 104 can use one or more visual detection machine learning models to identify features of a subject’s face captured in one or more captured images.
  • the data capture engine 104 can determine whether measurements satisfy one or more thresholds. For example, the data capture engine 104 can determine if roll, pitch, yaw or other angles of a subject’s head satisfy one or more thresholds or are within one or more value ranges.
  • the data capture engine 104 can capture one or more additional images while a subject maintains a current head pose.
  • the data capture engine 104 can prompt, e.g., using voice or visual prompts, the subject to remain still for additional image captures.
  • the data capture engine 104 captures video.
  • the data capture engine 104 captures images in a buffer and removes images that do not yield measurements that satisfy one or more thresholds.
  • the data capture engine 104 can capture one or more images using one or more cameras.
  • the data capture engine 104 can run a head pose estimator on one or more of the captured images.
  • the data capture engine 104 can determine whether or not the measurements satisfy one or more thresholds.
  • the data capture engine 104 can remove the one or more captured images from memory, e.g., an image buffer.
  • the data capture engine 104 can retain the corresponding one or more images for further processing, e.g., by one or more models for medical condition identification. By storing images in the buffer, the data capture engine 104 can reduce an amount of time a subject has to maintain a given pose.
  • the data capture engine 104 obtains neutral expressions of a subject. For example, the data capture engine 104 can identify and obtain frames with a subject’s face. The data capture engine 104 can process the frames, e.g., running a fiducial feature extractor on the subject’s face. The data capture engine 104 can monitor for a subset of features, such as features around the eyes or mouth. If the subset of features indicates the subject is not in a neutral state, the data capture engine 104 can provide a notification to the subject to make a corresponding change to their facial expression.
  • a subset of features such as features around the eyes or mouth. If the subset of features indicates the subject is not in a neutral state, the data capture engine 104 can provide a notification to the subject to make a corresponding change to their facial expression.
  • the data capture engine 104 can flag the frame can either not use it for medical condition determination or can prompt a subject or user for additional data, such as a new image to be taken.
  • the data capture engine 104 notifies a subject to correct an element of data capture to improve data capture quality.
  • the data capture engine 104 can activate speakers to notify a subject to center their face within one or more captured images.
  • the data capture engine 104 can display visual indicators on a screen directing a subject to change a head position, e.g., so that portions of the head indicative for a medical condition diagnosis are more visible.
  • the data capture engine 104 is configured to capture data indicating craniofacial features, anterior cervical features, intraoral cavity features, or a combination of these among others.
  • the data capture engine 104 can be configured to direct a subject to obtain images of their head.
  • the data capture engine 104 can use activated speakers or display elements to direct a user.
  • display elements indicate a scale, e.g., distance to camera) at which images or video is preferably taken.
  • the data capture engine 104 can illustrate sub-optimal images to inform a subject how not to obtain data, e.g., showing a representation of a subject capturing images at the wrong distance or in dim lighting conditions, among others.
  • the data capture engine 104 guides — e.g., using audio or visual cues — a subject to raise their chin such that their neck is largely visible to a camera. Desired neck position can be illustrated on a screen — e.g., by an avatar, actor, or other visual indication.
  • Head pose can be estimated by the data capture engine 104 using a head pose detector. Once measurements of one or more images — e.g., roll, pitch, yaw angles — satisfy one or more threshold requirements (such as the angles being in a range desired for ‘chin-up neck shots’), the data capture engine 104 can prompt the subject to remain still. The data capture engine 104 can gather additional one or more images as the subject remains still.
  • the data capture engine 104 stores images as the subject is positioning themselves corresponding to the cues and, once it obtains an image where one or more angles or features of the image satisfy one or more thresholds, the data capture engine 104 can store the corresponding one or more images for processing without requiring the subject to hold a pose for an extended period of time. Images can be recorded periodically while a subject is in frame. Once an image satisfies one or more threshold requirements — e.g., angles of the head satisfy one or more angle thresholds — an image can be saved for processing without further delay or pose holding by the subject. Other poses of a body, such as poses of the head or other body parts, such as a hand, wrist, leg, arm, among others, can be captured by the data capture engine 104 in a similar manner.
  • the data capture engine 104 notifies a subject if one or more captured images do not satisfy one or more quality thresholds.
  • the data capture engine 104 can capture one or more images using one or more cameras and determine one or more quality factors, such as pose angles, brightness, distance to camera, or a combination of these among others.
  • the data capture engine 104 can determine if one or more quality factors do not satisfy one or more thresholds.
  • the data capture engine 104 can determine that a brightness threshold is not satisfied if a subject captures an image in a dark room.
  • the data capture engine 104 can notify the subject.
  • the data capture engine 104 can notify the subject that one or more quality factors have not satisfied one or more thresholds.
  • the data capture engine 104 can notify the subject to make one or more alterations to a data capture to improve the quality factors, e.g., turn on a light in the room. Notifications can include visual or auditory cues.
  • the data capture engine 104 provides a questionnaire.
  • the data capture engine 104 can generate and provide for display a set of one or more questions and an interface configured to allow a subject to provide answers to the one or more questions, such as radio buttons or textboxes.
  • the questions can be a predetermined list, e.g., obtained from a server communicab ly connected to the data capture engine 104.
  • the data capture engine 104 generates questions using one or more generative models.
  • the data capture engine 104 can include a generative model configured to ask questions for a particular subject or a particular medical condition.
  • the questions can be designed to illicit one or more indicators of a medical condition.
  • a subject’s demographic information can be used to adapt questions.
  • a system can opt to skip one or more questions.
  • the system 100 uses generative models to ask questions. For example, the system 100 can use previously obtained data to personalize a set of questions, e.g., using a generative model, such as a language model. If abnormal features are present in the data, the system 100 can include questions to illicit responses to further describe those features, potentially eliciting underlying diseases or conditions related to a medical condition to be determined for a subject.
  • the data capture engine 104 provides questions to prompt a yes or no response from a subject.
  • the data capture engine 104 can provide a question that asks a subject: “have you been told that you snore,” e.g., in an example case for identifying a medical condition of sleep apnea.
  • the data capture engine 104 can store results for yes/no questions in binary format where 0 represents yes/no and 1 represents the response not represented by 0, e.g., 0 can represent no and 1 can represent yes.
  • such storage can effectively compress the captured data — e.g., where traditional responses might be recorded using string or integer values which occupy multiple bytes.
  • a sleep questionnaire include one or more of the following questions: “what is your height?”, “what is your weight?”, “have you been told that you snore?”, “do you feel tired or sleepy during the day?”, “do you have any trouble falling or staying asleep?”, or “do you have any one or more of the following conditions:” where conditions can include, e.g., high blood pressure, diabetes, heart disease, abnormal heart rhythm, stroke.
  • the data capture engine 104 requests subjects to provide a voice sample.
  • the data capture engine 104 can use visual or auditory cues to direct a subject to speak a word or phrase (e.g., one that captures one or more intonations).
  • the data capture engine 104 requests that a subject speak with a modification.
  • the data capture engine 104 can request that a subject speak a word or phrase while pinching their nose or performing other actions to modify the speech to provide additional data for processing.
  • the data capture engine 104 can capture voice data representing a subject speaking a word or phrase with a modification and without a modification.
  • the data capture engine 104 can use one or more microphones to capture a subject speaking a word or phrase.
  • the data capture engine 104 prompts a subject to present their oral cavity area.
  • the data capture engine 104 can prompt a subject to open their mouths wide and say “aahh”.
  • the data capture engine 104 can use voice or visual cues to help improve data capture.
  • the data capture engine 104 can capture video (e.g., approximately 5 seconds) of a subject’s oral cavity.
  • the video can include representations of tonsils, tongue, teeth arrangement, how arched the palate is, among other features of the subject.
  • the data capture engine 104 uses a subject who self-captures data, e.g., using a front facing camera of a device, such as a smartphone.
  • the data capture engine 104 uses a third party assisted system to capture data for a given subject. For example, a mechanism or a third-party person can help obtain data, e.g., using a rear facing camera of a device or specialized medical equipment.
  • data capture can be self-administered or administered by a medical professional.
  • a number of oral features captured in images of an oral cavity are analyzed using one or more models.
  • features can include one or more of: enlarged tonsils, elongated uvula, soft palate, Mallampati score (e.g., visibility of oropharynx), tongue size or position, narrow dental arches, retrognathia or micrognathia, crowded oropharynx, bruxism, mouth breathing (e.g., indication of dry mouth, inflamed gums, or other indications).
  • the system 100 can use oral cavity features to determine one or more output results, such as a likely medical diagnosis.
  • the system 100 can include elements to prompt a user to display portions of their oral cavity.
  • Prompts can include directing a subject to stick their tongue out, open mouth and direct camera of the data capture engine 104 on a lower, upper, back, or other parts of the mouth (e.g., lower teeth, upper teach, upper palate, uvula, tonsils, opening to pharynx, among others).
  • the data capture engine 104 can detect one or more malocclusions.
  • the data capture engine 104 identifies an optimal frame of one or more images captured.
  • the data capture engine 104 can use one or more analyzing processes to determine an image that is most conducive to accurate detections, such as sharpness detection, centering and alignment, contrast or edge detection, or a combination of these among others.
  • data captured is pre-processed before being provided to the condition identifier engine 108.
  • data can be provided to one or more models, such as a convolutional neural network, to extract features from the data.
  • the extracted data can be included in input data, such as the multi-modal data 106, for the condition identifier engine 108.
  • the data capture engine 104 includes a device used by a medical professional.
  • a medical professional can use a smartphone that captures image of a patient.
  • the data can be used by the system 100 to provide a likely medical diagnosis, or a set of potential medical diagnosis, for review by the medical professional.
  • multiple diagnoses can be provided to subjects for review by the subjects.
  • the data capture engine 104 obtains data indicating a subject’s tendency towards oral breathing versus nasal breath.
  • the data capture engine 104 can capture image data indicating an openness of a subject’s mouth during rest state.
  • the data capture engine 104 can perform face detection using one or more images of a subject.
  • the data capture engine 104 can fit a three-dimensional face model to a captured image representing a face of the subject.
  • a face model can include a dense set of vertices identified on a subject’s face, e.g., a subject’s mouth.
  • Coordinates of such vertices can be used to detect instances when a subject’s mouth was open or closed, e.g., when a subject is sleeping.
  • the data capture engine 104 can include a sensor that is activated during a subject’s sleep, e.g., to capture one or more images.
  • a subject can prop a camera of the data capture engine 104 to face themselves while the subject sleeps.
  • data indicating that a subject’s mouth is open while sleeping can indicate that the subject has a higher likelihood of oral breathing which can be correlated with sleep apnea or other medical conditions.
  • the data capture engine 104 includes one or more wearable or mobile devices.
  • the data capture engine 104 can capture data indicating health behavioral data, otherwise referred to as lifestyle data.
  • the data capture engine 104 captures one or more of the following: sleep patterns, physical activity, or body weight and body composition.
  • a wearable device or mobile device of the data capture engine 104 can include sleep tracking features, e.g., that can monitor one or more of sleep duration, quality, disruptions, irregularity in sleep patterns, decreased sleep efficiency, snoring, or frequent nighttime awakenings.
  • a wearable device or mobile device of the data capture engine 104 can capture data indicating physical activity levels of a subject — e.g., step counts, accelerometer data, elevation change data, GPS pings, among others. Decreased physical activity can increase the likelihood of certain medical conditions, such as sleep apnea.
  • a wearable device measuring various data for a subject such as heart rate, blood oxygen level, blood pressure, disturbances to sleep, body movements during sleep, among others, can provide data to, or as part of, the data capture engine 104.
  • a wearable device or mobile device of the data capture engine 104 can capture data representing body weight or body composition.
  • a wearable device or mobile device of the data capture engine 104 can track body weight, body mass index, or body composition trends which can, e.g., be correlated with one or more risks of medical conditions.
  • the data capture engine 104 captures biological vital readings.
  • the data capture engine 104 can capture heart rate, blood oxygen levels, or glucose levels.
  • a wearable device or mobile device of the data capture engine 104 can capture biological vital readings.
  • a subject can provide permission to obtain data from one or more devices, such as wearable or mobile devices.
  • a duration window for data collections by the data capture engine 104 can span days, weeks, months, among other time increments (e.g., the most recent few weeks).
  • the system 100 adjusts data capture, e.g., based on a likelihood of a medical condition.
  • the system 100 can adjust a granularity of a sleep questionnaire or a number of data types captured from a subject based on their OSA likelihood, their comorbidities, their prior health conditions, among other factors.
  • the system 100 can serve the purpose of gathering useful information that can help diagnose medical conditions, such as OSA, with more accuracy.
  • the system 100 can be used to identify one or more underlying factors that can lead to a medical condition, such as OSA. In some cases, the more serious a medical condition is anticipated to be (e.g., based on factors determined by the condition identifier engine 108 or obtained by the data capture engine 104), the more data points that are gathered by the data capture engine 104.
  • the data capture engine 104 provides a subject with a choice of data streams to provide.
  • the data-types that the data capture engine 104 can obtain from a subject may vary from one subject to another, e.g., due to various reasons such as (i) personal preference or permissions, (ii) lack of access to equipment needed for data capture (e.g., microphone or camera), (iii) subject might not have sufficient time to provide all data, (iv) an environment the subject is submitting data from might not be conducive for certain types of data collection.
  • the data capture engine 104 includes one or more models trained to determine a level of a medical condition risk.
  • the data capture engine 104 can include a model trained on one or more data types to predict a risk of a medical condition, such as OSA.
  • the data capture engine 104 can provide an interface for capturing additional data. For example, if the data capture engine 104 determines that a subject is at a high risk based on one or more detected factors of OSA, then the data capture engine 104 can provide an interface for the subject to provide additional data, e.g., for a more thorough analysis. Additional data can help the system combine data from multiple streams and can help improve accuracy of medical condition identification, risk model predictions, attribution predictions, among others.
  • a risk model of the data capture engine 104 is trained on different data than one or more medical condition detection models of the condition identifier engine 108.
  • a risk model of the data capture engine 104 can be trained on data that is relatively easy to provide, such as questionnaire data or data from connected devices such as wearables or mobile devices. In this way, the system 100 can reduce an amount of time subjects spend submitting data by requiring only high risk subjects to provide substantial data and allowing low risk subjects to provide less data.
  • the condition identifier engine 108 obtains the multi-modal data 106 generated by the data capture engine 104.
  • the condition identifier engine 108 includes data preprocessing stages.
  • the condition identifier engine 108 includes feature extraction processing stages.
  • the condition identifier engine 108 generates output, e.g., the identified medical condition 110, treatment actions 112, or a combination of these among others.
  • the condition identifier engine 108 enables a personalized medicine approach, e.g., through personalized output based on captured data of a given subject.
  • the condition identifier engine 108 generates the treatment actions 112 that include, e.g., one or more therapy pathways, such as Oral appliance Therapy, CPAP, electric stimulation implant, pharmaceutical interventions, or a combination of these among others.
  • the system 100 can obtain data of a subject with mild to moderate OSA.
  • the system 100 can recommend that the subject is a good candidate for oral appliance therapy.
  • the system 100 recommends treatment based on measured compliance rates.
  • the system 100 can recommend that the subject is a good candidate for oral appliance therapy as the system 100 may determine, based on one or more stored or obtained values indicating compliance for one or more treatments, that the oral appliance therapy has better compliance than CPAP. Therefore, the system 100 can include oral appliance therapy in output, such as the treatment actions 112.
  • condition identifier engine 108 pre-processes obtained face images (e.g., frontal, profile, or other views of a subject’s face obtained by a camera of the data capture engine 104).
  • the condition identifier engine 108 can include one or more of the following processing elements: (i) a face detector, (ii) fiducial feature extractor, e.g., on a craniofacial complex of a subject, or (iii) a fiducial feature extractor, e.g., on an anterior cervical region of a subject.
  • preprocessing includes performing operations of a face detector on one or more images obtained by the data capture engine 104.
  • the condition identifier engine 108 can perform face detection using one or more trained models of a face detector to determine one or more regions that represent a face in one or more images.
  • the preprocessing can output a bounding box with coordinates (e.g., xl, yl, x2, y2). The bounding box can bound a representation of a face in one or more images.
  • the condition identifier engine 108 includes a deep learning model.
  • the deep learning model can be trained to fit face models, such as three-dimensional face models, to images that include faces, such as two-dimensional face images.
  • the fitted face models can include a face mesh that includes a dense set of vertices fit onto the obtained image of a subject’s face.
  • the face image can include one or more face poses, such as frontal or sagittal.
  • a set of fiducial landmarks can be used — e.g., those used for face anthropometric studies. Fiducial points on a subject’s face can be used for feature localization or face registration.
  • the condition identifier engine 108 can identify a dense set of feature locations using images of a face.
  • the condition identifier engine 108 can extract measurements from the face (e.g., both 2D measurements and 3D measurements, if a fitted face mesh provides X, Y, Z locations for every feature point identified or locations that satisfy one or more accuracy thresholds).
  • the measurements can be used by the condition identifier engine 108 to estimate relative positionings of facial features, such as mid-face or lower face, retrognathic mandible, prognathic maxilla, among others.
  • the condition identifier engine 108 includes models that are trained using a vision library that specializes in the detection of fiducial features and edges in the anterior cervical region. Face detection algorithms are typically trained to detect features of the craniofacial complex — e.g., the head, face, and oral cavity.
  • the anterior cervical region can include a neck of a subject — e.g., below inferior border of the mandible and above the jugular and clavicular notch of the manubrium.
  • One or more models of the condition identifier engine 108 can be trained to detect features within the anterior cervical region.
  • a training library used for such training can include labeled ground truth images that represent silhouettes in an anterior cervical region and fiducial features corresponding to the menton and cricoid elements in the anterior cervical region.
  • a training library for one or more models of the condition identifier engine 108 can include data obtained by one or more subjects or previously prepared data — e.g., from online image datasets or artificially generated data, such as artificially generated anterior cervical region data.
  • a subject obtains data for training.
  • a subject can present a cervico-craniofacial complex from multiple viewpoints: covering one or more of frontal, facing left, or facing right. “Facing left” and “facing right” can mean different things for different people, e.g., based their neck mobility.
  • one or more models of the condition identifier engine 108 generate face meshes using obtained face images. For example, a model can generate a three- dimensional face mesh for each face represented in one or more images. N number of points can be extracted for all points of fit. Each point can include one or more coordinates (e.g., x, y, and z). Face meshes extracted by the condition identifier engine 108 can be integrated into a single face mesh for a given subject. Face mesh integration can be performed by the condition identifier engine 108.
  • Face mesh integration can include integrating three-dimensional face mesh points where two coordinate values (e.g., x,y) correspond to feature coordinates on a frontal plane and a third coordinate value (e.g., z) corresponds to a feature depth (such as a sagittal plane).
  • the coordinates corresponding to the frontal plane can be better estimated when the face is near frontal pose.
  • depth of features points on the left half of the face can be better estimated when the subject poses to expose the left half of the face and vice- versa.
  • Face mesh integration can include averaging one or more mesh data points.
  • integration can include averaging the (x,y) points.
  • Integration can include averaging the z points of a face mesh. For averaging the z points, weights on points on each side of the face can be computed separately. When the subject faces left, the weights computed can used to adjust the depth of feature points on the right and vice-versa. The weight can be computed as max(0, yaw/(pi/2)). Weights can be normalized following similar processes as described for (x,y) coordinates. In this way, z coordinate values can be refined using data from multiple different poses.
  • condition identifier engine 108 estimates a pose of a subject’s head.
  • the condition identifier engine 108 can use a face mesh, generated from one or more obtained face images, to estimate a pose of a subject’s head.
  • the condition identifier engine 108 can identify a search space for a front portion of an anterior cervical region using a height of a face mesh, e.g., either individually or with additional data, such as a tilt of the face mesh.
  • condition identifier engine 108 can identify a search space for an anterior cervical region can be initialized as 150 pixels wide, 100 pixels in height, located below a face mesh with a same tilt as that of the face mesh. In some cases, identifying a search space can improve efficiency or accuracy of a detection of an anterior cervical region, e.g., by reducing a region within which to search for the anterior cervical region.
  • condition identifier engine 108 uses a face mesh to identify a vertical axis of a subject’s face.
  • the condition identifier engine 108 can identify an axis that runs through the middle of a subject’s face.
  • the axis can split the face into two nearly symmetric halves. This axis can be referred to as a mid-facial axis.
  • the condition identifier engine 108 performs an edge detection process. For example, within an identified search space for an anterior cervical region, the condition identifier engine 108 can perform an edge detection process.
  • the condition identifier engine 108 can identify an anterior cervical region by detecting two nearly parallel lines — e.g., lines that are within a threshold level of being parallel.
  • the condition identifier engine 108 can identify lines that represent a silhouette of an anterior cervical region in a front face pose, such as neck of a subject.
  • the condition identifier engine 108 can use Hough transforms or other similar line fitting tools on edge imagery.
  • the orientation of the mid-facial axis can be used as a starting point for a parameter search space, e.g., when invoking Hough transform or its equivalent algorithms for line fitting.
  • condition identifier engine 108 identifies a search space for an anterior cervical region using profile face images.
  • a search space can be defined using a size or orientation of a face mesh, e.g., that is fit onto an image of a face taken in profile pose.
  • the condition identifier engine 108 can perform an edge detector process within an identified search region and identify a region that captures elements of the anterior cervical region, such as a lower chin or neck.
  • the condition identifier engine 108 extracts one or more facial features.
  • the condition identifier engine 108 can extract data indicating a thyromental distance, thyromental angle, cricomental distance, or a combination of these, among others.
  • the thyromental distance can correspond to a distance between the thyroid notch to a chin, e.g., when a head is extended.
  • the thyromental distance can indicate a mandibular space.
  • the condition identifier engine 108 can identify aspects of Patil’s Triangle which can include thyromental distance or thyromental line. Aspects of Patil’s Triangle can indicate difficultly breathing — e.g., representing a risk factor for sleep apnea.
  • the condition identifier engine 108 includes one or more deep learning models or shallower models (e.g., machine learning models).
  • the one or more models of the condition identifier engine 108 can be trained to detect or characterize medical conditions, such as sleep apnea.
  • the one or more models can characterize onset of medical conditions using single modal data or multi-modal data, such as the multi-modal 106 shown in the example of FIG. 1.
  • Data processed by the condition identifier engine 108 can be gathered by the data capture engine 104.
  • the data capture engine 104 can capture data using smartphone applications, sleep clinic data, or a combination of these, among others.
  • One or more models of the condition identifier engine 108 can be trained in partial, or fully, supervised fashion.
  • Data types used for training can include data from a sleep questionnaire, a frontal face image, a sagittal face image, a voice sample (e.g., with or without restrictions to nasal passage), an image from within the oral cavity, or a combination of these among others.
  • Ground truth data used for training can include data indicating a physician’s diagnosis for a subject, where the training data includes corresponding input data for that subject.
  • Ground truth data can include results from an overnight sleep test.
  • Ground truth data can include an indication of whether or not a subject has a given medical condition, such as sleep apnea.
  • Ground truth data can include an underlying cause identified by a physician in determining the given medical condition, e.g., if a subject had sleep apnea, then an underlying cause can include one or more of obesity, protruded jaw, enlarged tongue, crowded teeth, deviated septum, double chin, enlarged tonsils, or a combination of these among others.
  • Ground truth data can include recommended interventions. For example, if a subject had sleep apnea, ground truth data can include what a medical professional recommended as an intervention, such as wearing a CPAP device, reduce BMI, mandibular advancement device, surgical procedures to trim uvula or tonsils, among others.
  • the system 100 can collect ground truth data when available.
  • the ground truth data can help build multi-tasking deep learning models, e.g., models that can perform operations, such as classifying a subject as one with a given medical condition, identifying one or more most likely causes of a condition, recommending interventions, or a combination of these among others.
  • models that can perform operations, such as classifying a subject as one with a given medical condition, identifying one or more most likely causes of a condition, recommending interventions, or a combination of these among others.
  • Training and ground truth data can be specific to a medical condition to be identified.
  • ground truth data used for training to detect sleep apnea medical conditions can include data obtained using Apnea Hyponea Index (AHI) or a physician’s diagnosis of mild apnea, moderate apnea, severe apnea, or no apnea.
  • Ground truth data can include physician notes related to a subject, e.g., indicating obesity, high BMI, moderately high BMI, heart disease, previous medical issues, such as stroke, receded jaw, facial asymmetry, enlarged tonsils, enlarged tongue, crowded teeth, deviated septum, or a combination of these, among others.
  • the condition identifier engine 108 includes one or more language models trained to identify features from physician notes to be included in training data, e.g., as ground truth data.
  • models of the condition identifier engine 108 output data indicating whether a subject presents conditions within a class of a plurality of severity classes.
  • a model can be trained to classify a subject as either belonging to or not belonging to one or more classes.
  • the classes can include mild, moderate, severe, or no presence of a given medical condition, such as OSA.
  • Ground truth data can include a classification for one or more training data subjects.
  • one or more models of the condition identifier engine 108 are trained to attribute one or more elements of a subject’s current biological state to a given identified medical condition.
  • a model can perform OSA attribution (e.g., identifying a likely cause(s) for sleep apnea based on evidence presented or data obtained) apart from, or in addition to, OSA classification.
  • a model can, e.g., identify a subject’s weight as a primary factor that is causing OSA or another medical condition identified by the condition identifier engine 108.
  • Example types of attributions can include one or more of the following: (i) Enlarged Neck circumference: attributing a medical condition to an enlarged neck circumference, e.g., when a subject gains weight; (ii) Facial retrusion: Can be characterized by a lack of forward projection of facial structures, including maxilla and mandible; (iii) Maxillomandibular insufficiency: Underdevelopment of both lower and upper jaw; (iv) Retrognathia: A retruded or posteriorly positioned lower jaw, which affects alignment of upper and lower teeth; (v) Micrognathia: An abnormally small jaw contributing to reduced airway space; (vi) Overbite: Excessive vertical overlap of the front teeth; (vii) Narrow nasal passages: Constriction in the nasal airways impeding airflow while breathing; (viii) Deviated septum: Differences in the size of the two nasal passages can lead to OSA; (ix) High arched palate: Affects the size and shape
  • attributions specifically considers a medical condition of OSA but similar attributions could be generated, e.g., by the condition identifier engine 108, for other medical conditions. Other types of attributions not listed can be identified in some cases.
  • the condition identifier engine 108 includes one or more trained models.
  • Each of the one or more trained models can generate data that indicates the identified medical condition 110.
  • the one or more trained models can include a model that uses input from one or more sleep questionnaires. The input from the one or more sleep questionnaires can be mapped to a series of binary digits. With sufficient labeled data, the sleep questionnaire dataset can be fed as input to a model, such as a multilayer perceptron or multilabel support vector machine.
  • the model can include a lightweight medical condition classifier, such as an OSA classifier.
  • the one or more trained models includes a model using input data from one or more time-series inputs — e.g., a subject’s wearable device data streams.
  • the input data can indicate sleep quality, pulse rate, blood glucose level, physical exercise, bodily composition (BMI related), or a combination of these among others.
  • the input data can be gathered over a period of time.
  • the model can be configured for time-series data.
  • the model can be a deep learning model that is trained to obtain data, such as data from one or more time-series inputs, and provide an indication of a medical condition.
  • the one or more trained models includes a model using input data representing frontal or sagittal views of a subject’s face.
  • both raw face images and one or more vertices of a face mesh can be provided to the model, which can include deep learning architecture.
  • the deep learning architecture can include a combination of one or more of: convolutional layers, convolutional blocks, skip connections, fully connected layers, activation functions, or pooling layers.
  • the loss functions can be configured such that the deep learning system can perform both medical condition identification (e.g., OSA classification) and medical condition attribution (e.g., OSA attribution).
  • the system 100 can use model quantization or model pruning techniques to build deep learning models that can run on the edge — e.g., on a device of the data capture engine 104.
  • the one or more trained models includes a model using one or more voice samples from a subject.
  • the model can include one or more deep learning elements.
  • the model can obtain two voice samples from a subject: one captured under usual settings and the other captured with restrictions to the nasal airway (such as nasal pinching).
  • the model can be trained to identify differences in intonations from the two samples and, e.g., detect obstructions that exist in the nasal airway.
  • obstruction detections can be included in the identified medical condition 110.
  • the one or more trained models includes a model that identifies if there are restrictions to an airway passage.
  • the model can obtain images from an oral cavity. Using the images, the model can identify if there are likely any restrictions to the airway passage. If restrictions are detected, the model can be configured to identify one or more factors that are attributed to the restrictions in the airway passage, such as (a) enlarged tonsils (b) enlarged tongue (c) high arched palate (d) crowded teeth, or a combination of these among others.
  • the one or more trained models includes a model trained on multimodal data, such as the multi-modal data 106.
  • the multi-modal data can include one or more of text (e.g., a questionnaire), an image (e.g., frontal, sagittal views of the face and neck, or views from within the oral cavity), life-style data gathered from wearables, audio, or a combination of these among others.
  • the model can be configured to perform medical condition classification or attribution or both.
  • the one or more trained models includes a model trained to detect one or more medical conditions using extracted elements from images of a body.
  • the system 100 can obtain images of a subject’s face. The images can include various representations of the subject’s face. Angles and measurements of the subject’s face can be extracted by the system 100 and used as input for a model to identify one or more medical conditions.
  • the system 100 extracts elements using a face mesh. Extracted elements, such as angles, aspect ratios, or other extracted data points from a subject’s face, can be provided to the model in an N dimensional vector, where N represents the number of elements extracted from the one or more images of the subject.
  • a support vector machine or random forest or similar such machine learning solution can be used to build a medical condition classifier or attributor model.
  • condition identifier engine 108 includes one or more trained models that include a deep learning architecture.
  • data is pre-processed by the data capture engine 104 and then provided as processed input to the condition identifier engine 108.
  • condition identifier engine 108 performs pre-processing and processing or just processing, e.g., using one or more trained models.
  • a model can be based on a U-Net model for segmenting different anatomical regions in the oral cavity. This model can start with an input layer that processes high-resolution 2D or 3D scans of the oral cavity.
  • An encoder path can include one or more convolutional layers with varying or increasing filter sizes, followed by Batch Normalization and ReLU activation.
  • the encoder path can include max-pooling layers to capture contextual information by reducing spatial dimensions. At the bottleneck, dilated convolutions with larger receptive fields can be used to capture multi-scale context effectively.
  • a decoder path can include upsampling layers using transposed convolutions or nearest-neighbor upsampling, followed by convolutional layers. Skip connections from the encoder to the corresponding decoder layers can help ensure recovery of spatial information lost during downsampling.
  • a final output layer can use a SoftMax activation function, e.g., to classify each pixel into a respective anatomical region.
  • Attention U-Net can integrate attention gates in skip connections, e.g., to focus on relevant features and suppress background noise.
  • Residual connections can facilitate training of deeper networks, while multi-scale features can capture details at various scales, e.g., through pyramid pooling or atrous spatial pyramid pooling (ASPP).
  • a hybrid loss function combining Dice loss with cross-entropy loss can address class imbalance and help improve segmentation accuracy.
  • the training process can incorporate data augmentation, transfer learning, and regularization techniques like dropout and weight decay to, e.g., ensure robust and precise segmentation of oral cavity anatomical regions.
  • the system 100 can use the maps for classification and medical condition detection, such as OSA.
  • specific metrics can be calculated from segmented regions, such as the size and volume of each anatomical structure, positional relationships, shape descriptors, and texture features. Functional features, like airway patency and dynamic changes in anatomical structures, can be assessed, e.g., if dynamic scans are available.
  • the system 100 can include classification models using these metrics, e.g., to predict the onset of OSA.
  • Feature selection techniques can identify the most relevant features, and classification models such as Support Vector Machines (SVMs), Random Forests, and Deep Learning models like CNNs or LSTMs can be trained.
  • a training dataset can be split into training, validation, and testing sets, e.g., with k-fold cross-validation ensuring robustness and generalizability. Hyperparameter tuning can be performed using grid search or random search.
  • condition identifier engine 108 includes a segmentation model and a classification model.
  • the condition identifier engine 108 can process raw data, e.g., captured by the data capture engine 104, and generate maps using the segmentation model.
  • the condition identifier engine 108 can extract features from the maps generated by the segmentation model .
  • the condition identifier engine 108 can provide the extracted features to the classification model.
  • a segmentation model can divide a face into various anatomical regions. Segmentation can be performed either in a learning-driven way or rule-based way.
  • learning driven way the system 100 can provide ground truth labels for which face region corresponds to lower jaw, upper jaw, chin, mandible, maxilla, mouth, nose, nasal septum, among others.
  • rule-based way the system 100 can include rules that indicate, e.g., a face contour comprising a set of pre-defined indices form a lower jaw, certain other pre-defined indices form an upper jaw, and other indices can form other regions.
  • segmented features can help classification.
  • the condition identifier engine 108 can generate or obtain a segmented face and use that to identify conditions such as micrognathia, retrognathia, deviated septum, prognathic maxilla, or a combination of these, among others.
  • the condition identifier engine 108 can be evaluated using metrics, such as accuracy, precision, recall, Fl score, and AUC-ROC curve. Output of the condition identifier engine 108 can be clinically validated, e.g., by comparing predictions with polysomnography results or other results depending on the medical condition.
  • the system 100 can be deployed in clinical settings or at home for real-time medical condition risk assessment, e.g., with continuous monitoring and updates to maintain and improve performance.
  • the multi-modal data 106 indicates age related changes to a subject.
  • the multi-modal data 106 can include data across time, e.g., across one or more years.
  • the condition identifier engine 108 can analyze changes across time to determine onset of a medical condition.
  • the condition identifier engine 108 can obtain data of a frontal and side-view of the subject’s face (e.g., one or more images). Images can be obtained, e.g., during a subject’s annual physical exam.
  • a face model in the form of a face mesh, e.g., comprising N vertices identified on the face, can be fit to the frontal and profile face images of the patient taken across years.
  • the face models across time can be aligned with one another, e.g., using a midpoint of the eyes or other feature in the images.
  • Changes between the images e.g., captured by changes in face models, can be determined by the condition identifier engine 108.
  • One or more trained models of the condition identifier engine 108 can use the changes to determine a presence of a medical condition.
  • the condition identifier engine 108 can identify physical drifts — e.g., in (X,Y,Z) — observed on different vertices (e.g., estimated on frontal faces and profile faces, respectively) across years.
  • the physical drifts can be processed by a deep learning system to characterize sleep apnea onset.
  • the condition identifier engine 108 can include one or more trained models for determining an onset of a medical condition.
  • one or more models can be trained to detect one or more of the following signatures of OSA onset: (i) Rounding of face across years: Increase in facial fat deposits can result in puffy cheeks, rounded or swollen face. The increased fat deposits can be indicative of narrowed airways and hence higher likelihood of OSA; (ii) Appearance of double chin: Obesity, skin losing elasticity due to age, weakening of muscles due to age or lack of exercise, poor posture (slouching or frequently looking down on electronic devices) can result in double chin.
  • the system 100 operates, at least in part, on an edge device.
  • Sensitive data can be kept on a user device (e.g., of the data capture engine 104).
  • the condition identifier engine 108 can perform one or more processes, such as: (i) fitting face models to images obtained on an edge device; (ii) extracting features from a craniofacial complex of a subject; (iii) extracting features from an anterior cervical region, or a combination of these among others.
  • the condition identifier engine 108 can be performed by a device on the edge, e.g., by a mobile device of a subject, such as a smartphone.
  • a face image can be discarded in response to features being extracted.
  • the condition identifier engine 108 can save just the extracted features, e.g., fiducial feature locations, on an external memory device, e.g., on a server. By saving only the extracted features and not the original image of the subject, the sensitive data of the subject can be preserved with reduced risk of exposure. In some cases, the condition identifier engine 108 can save extracted features with data reference a corresponding location in one or more images obtained on a device of the data capture engine 104.
  • the data capture engine 104 or the condition identifier engine 108 includes one or more encoders configured to encode data into one or more embeddings.
  • the condition identifier engine 108 can include one or more transformers, e.g., that are data modality specific.
  • the condition identifier engine 108 can include a cross-modal fusion transformer, e.g., that transformers output from the one or more transformers.
  • the condition identifier engine 108 can use output of the cross-modal fusion transformer to generate the identified medical condition 110.
  • Deep learning neural networks through multi-layered hierarchical structures, can be used on one or more data modes to process and provide analysis for subjects.
  • Transformers can be used with input-embeddings, positional encoding mechanisms, self-attention, and cross-modal attention mechanisms.
  • Sparsely gated mixture-of-experts layer can be used to integrate with transformer architectures to, e.g., increase a capacity of the model while keeping computational complexity of the model limited.
  • the system 100 can have one or more of the following components (e.g., as part of the data capture engine 104 or the condition identifier engine 108): (i) Input Embedding Engine: Each input modality can be pre-processed and converted to a respective embedding using pre-trained models such as ResNet or VGG or similar such for images, BERT or word2vec or similar such for text, Mel-frequency cepstral coefficients (MFCC) or similar such for audio; (ii) Modality-specific transformers: The embeddings obtained from each modality can be processed by transformers dedicated to process that modality.
  • pre-trained models such as ResNet or VGG or similar such for images, BERT or word2vec or similar such for text, Mel-frequency cepstral coefficients (MFCC) or similar such for audio
  • MFCC Mel-frequency cepstral coefficients
  • a gating network can be used to weigh the contributions of each expert based on input data;
  • Prediction head A fully convolutional network or a fully connected layer can be deployed to aggregate information from the mixture of experts and produce a final sleep apnea detection result.
  • models used in the system 100 can be trained using unsupervised machine learning techniques, e.g., from video or images.
  • one or more models of the system 100 are evaluated using test and validation datasets that are non-overlapping with training datasets.
  • ROC-AUC curves, Fl- scores, among other values can be generated to evaluate various operating points.
  • the condition identifier engine 108 can be configured to perform one or more of: (i) face registration: Face image can be processed, e.g., by a pretrained deep learning model, to fit a model to the face. Model can identify over 100 feature points. A dense set of measurements can be extracted from the feature points (e.g., from the lower and upper jaw, from the lower face, from the mid face, or holistically); (ii) hand registration: A hand image can be processed, e.g., by a pretrained deep learning model, that identifies the finger joint locations; (iii) oral cavity registration: using an image processing library, the condition identifier engine 108 can detect teeth positions along a lower or upper jaw, correct for any in plane rotation, identify different components of the oral cavity pixel-wise, or perform a combination of these among other processes; (iv) converting questionnaire responses to a binary vector.
  • face registration Face image can be processed, e.g., by a pretrained deep learning model, to fit a model to the face. Model can
  • Ground truth data can be generated in the following way.
  • a subject can visit a sleep clinic/cardiologist/general physician.
  • the subject can provide data (e.g., face scan, hand scan, oral cavity scan, questionnaire answers, or a combination of these among others).
  • the subject can be evaluated by a professional for a medical condition, such as OSA, CVD, or both among other conditions.
  • a professional diagnosis e.g., diagnostic code, visit summary, comorbid conditions, potential underlying conditions, suggested treatment plans
  • the records can indicate a mix of medical conditions, such as OSA only, CVD only, OSA and CVD, other conditions or condition combinations, or no conditions.
  • the condition identifier engine 108 can be configured to obtain one or more items of multi-modal data.
  • the condition identifier engine 108 can include a transformer (e.g., based on a deep learning model).
  • the transformer model can be trained for estimating a probability of a medical condition, such as a presence of OSA and CVD or one of these, among other conditions.
  • the model can identify potential underlying causes.
  • the model can recommend potential treatment plans.
  • Medical conditions identified by the system 100 can include Obstructive Sleep Apnea (OSA) with associated causes including retrognathic mandible, retrognathic maxilla, mandibular insufficiency, enlarged tonsils, enlarged adenoids, high arched palate, crowding of teeth, micrognathia, or a combination of these among others.
  • OSA Obstructive Sleep Apnea
  • the system 100 can detect one or more causes an attribute an identification of a medical condition to one or more of the causes.
  • Medical conditions identified by the system 100 can include CVD with associated causes including diagonal earlobe crease, xanthelasma, butterfly rash, Bells Palsy, xanthoma, arcus juvenilis, acanthosis nigricans, skin tags, ear lobe crease, nicotine stains, hyperpigmented hands in betel quid sellers, or a combination of these among others.
  • the system 100 includes providing data to a medical professional for diagnosis.
  • the data capture engine 104 can capture data and provide at least a portion of the captured data to a medical professional.
  • the medical professional can review the data provided.
  • the medical professional can provide diagnostic information, such as a determined symptom or condition.
  • the medical professional can write a diagnosis.
  • Information provided by the medical professional can be provided to the system 100, e.g., as part of the data capture engine 104.
  • the information can be processed and used for determining one or more diagnoses.
  • a language model such as a large language model, can parse text data from a medical professional to identify one or more factors or conditions within provided information.
  • the condition identifier engine 108 uses data extracted from information provided by a medical professional to help determine output results, e.g., the identified medical condition 110.
  • a medical professional annotates captured data.
  • a medical professional can annotate one or more captured images to, e.g., identify a location of one or more regions in a mouth, such as teeth, tongue, uvula, tonsils, palate, or a combination of these among others.
  • FIG. 2 shows another example medication condition determination system 200.
  • the system 200 includes the data capture engine 104 and the condition identifier engine 108.
  • the system 200 includes an appointment scheduling system 202 and a device 204 of the data capture engine 104.
  • the device 204 is a smartphone.
  • the data capture engine 104 can obtain the multimodal data 106 and provide the multi-modal data 106 to the condition identifier engine 108.
  • the condition identifier engine 108 includes one or more models trained to generate treatment actions 206.
  • the treatment actions 206 can include appointment data.
  • the condition identifier engine 108 provides the treatment actions 206 to the appointment scheduling system 202.
  • the treatment actions 206 include an indication for scheduling an appointment. For example, if a likelihood of a medical condition satisfies one or more thresholds, the condition identifier engine 108 can generate one or more actions of the treatment actions 206 indicating an automatic appointment.
  • the appointment scheduling system 202 can obtain an action indicating an automatic appointment and, in response, schedule an appointment for a given subject.
  • the appointment scheduling system 202 can provide feedback data to the device 204, e.g., confirming a scheduled appointment, providing an interface to adjust a scheduled appointment, providing an interface to add the appointment to a device calendar, or a combination of these among others.
  • the appointment scheduling system 202 can access a subject’s personal or work calendars, e.g., to identify an open time slot for scheduling an appointment. Appointment details can be provided by the appointment scheduling system 202 to a medical professional, e.g., providing indications of one or more medical condition risks, historical symptoms, or current symptoms.
  • a severity, or likelihood of severity, of a given medical condition determined by the condition identifier engine 108 determines whether the condition identifier engine 108 generates treatment actions 206 that include an action (i) for automatic appointment scheduling, (ii) for optional appointment scheduling, or (iii) other health recommendations.
  • the condition identifier engine 108 can generate an action for automatic appointment scheduling in cases where a subject’s medical condition poses a significant health risk, e.g., a risk of damage or harm to the subject’s wellbeing in the near term.
  • the condition identifier engine 108 can generate an action for optional appointment scheduling or other health recommendations — e.g., (i) manage a healthy weight, (ii) avoid alcohol or sedatives, (iii) avoid smoking, (iv) sleep on side, (v) establish regular sleep routine, or a combination of these among others — for medical conditions that are less severe, compared to medical conditions that cause the condition identifier engine 108 to generate an action for automatic appointment scheduling.
  • user preferences can determine what actions the condition identifier engine 108 generates, e.g., not automatically scheduling an appointment based on user preferences even if a medical condition would, for another user, be sufficient to cause the condition identifier engine 108 to generate an action for automatic scheduling.
  • the system 100 or the system 200 can generate output that is transmitted to a medical system or to a user.
  • a system can create a diagnostic report, e.g., indicating output from the condition identifier engine 108.
  • a report can be sent via email to a subject or medical professional.
  • a system can automatically upload data into an electronic medical record system or other application.
  • Uploaded data can include output from the condition identifier engine 108.
  • Uploaded data can be represented in various forms, such as PDFs, images, text, among others.
  • the appointment scheduling system 202 provides information of a clinic or other medical professional to be visited by a subject.
  • the appointment scheduling system 202 can provide data indicating map directions to a medical professional, a building map for finding an office of a medical professional, a suggested time for leaving (e.g., based on predicted other scheduled appointments of the subject, traffic, among other factors), or a combination of these among others.
  • FIG. 3 shows example processes of data capture for a medication condition determination system.
  • a process 302 shows image capture using a device with a display, such as the device 204.
  • the process 302 includes capturing one or more views of a subject’s body.
  • the process 302 shows capturing one or more views of a subject’s face, including a front, neck, right, and left profile view.
  • the process 302 is performed by a device with a display and front facing camera, e.g., a smartphone.
  • the process 302 can be performed, e.g., by the data capture engine 104.
  • the process 302 includes one or more audio or visual cues to a subject to display different views of body parts so that a camera can capture images of those body parts for processing.
  • data capture using the process 302 is performed with assistance from a clinician (e.g., a doctor or other medical professional).
  • a process 304 shows a subject rotating a device, e.g., a device of the data capture engine 104, to obtain one or more views.
  • the process 304 can include obtaining one or more images, such as a single image or video.
  • the process 304 can include a subject holding a device, such as a smartphone, and rotating the device around a body party of the subject to capture one or more views of the body part.
  • the different views captured can be used to generate a model of a body part, such as a face or other body part.
  • the process 302 and 304 can be used to capture data for processing, e.g., by the system 100 to generate the identified medical condition 110 using one or more trained models of the condition identifier engine 108.
  • FIG. 4 shows an example system 400 for processing medical reports.
  • the system 400 includes an embedding engine 404, an index engine 408, an indexed vector store 410, a query embedding engine 414, a search engine 418, and a language model 422.
  • the system 400 processes one or more medical reports, such as medical report 402, and generates an indexed set of embeddings stored in the indexed vector store 410 — e.g., within computer device memory.
  • the system 400 can then query data in the indexed vector store 410 using a query 412.
  • Search results 420 of the query can be provided to the language model 422.
  • the language model 422 can use the search results 420 to generate a query result 424.
  • the query result 424 includes a written diagnosis or prediction of a medical condition for a subject.
  • the embedding engine 404 obtains the medical report 402.
  • the medical report 402 can be a report that is written by a medical professional, e.g., a doctor, and that indicates a medical condition of a subject.
  • the embedding engine 404 can be configured to generate embeddings using the medical report 402 as input data.
  • the embedding engine 404 separates the medical report 402 into text chunks — e.g., words, phrases, sentences, paragraphs, or sections.
  • the embedding engine 404 can then generate an embedding of the embeddings 406 for each of the separated text chunks.
  • the embedding engine 404 generates a set of embeddings without prior text separation. Text separation can be performed as a first part of a machine learning model that trained to generate embeddings or can be a separate process or machine learning model that is trained to separate text chunks of the medical report 402.
  • non-text data is obtained from the medical report 402.
  • the medical report 402 can include data from a medical record system.
  • the embedding engine 404 can embed one or more non-textual features included in the medical report 402.
  • the non-textual features can include medical images, such as x-ray images, diagrams, or the like.
  • each non-textual element can be encoded by the embedding engine 404 as an embedding of the embeddings 406.
  • the embedding engine 404 can provide the embeddings 406 to the index engine 408.
  • the index engine 408 can obtain the embeddings 406.
  • the index engine 408 can use the embeddings 406 to generate one or more indexed embeddings which can be stored as indexed vectors in the indexed vector store 410.
  • An indexing process can include mapping each vector to a specific position within a multidimensional space, e.g., using algorithms like k-d trees or locality-sensitive hashing (LSH). Indexing can allow for rapid similarity searches and enable quick and accurate retrieval of relevant information when querying an indexed data set.
  • the system 400 can include querying the data stored in the indexed vector store 410.
  • the query embedding engine 414 can obtain the query 412.
  • the query 412 can include a query of a subject or medical professional, e.g., provided using a computer interface to one or more devices running the system 400, such as a desktop computer or smartphone.
  • the query embedding engine 414 can generate a search embedding 416 using the query 412.
  • the query embedding engine 414 can generate an embedding similar to the embedding engine 404.
  • An embedding can include one or more values that help to identify a given segment of text, such as a segment of the medical report 402 or at least a portion of the query 412.
  • the query embedding engine 414 can include one or more models that are trained to encode data into embeddings.
  • an embedding engine can be trained using an encoder and decoder network setup.
  • An embedding engine can be an encoder. A system of an encoder and decoder and taken input data, such as text or non-text input.
  • the encoder can transform the input into an embedding.
  • the decoder can then attempt to generate the input data from the generated embedding.
  • the decoder and encoder can be separately or jointly trained to generate an encoder and decoder pair that can generate embeddings sufficient to later generate the input data.
  • an encoder can be used in an embedding engine. Subsequent training can be performed as the system 400 performs operations, e.g., in response to an identified medical condition being determined to be correct or incorrect.
  • the query embedding engine 414 provides the search embedding 416 to a search engine 418.
  • the search engine 418 can query the indexed vector store 410 to generate the search results 420.
  • the search results 420 can indicate information related to the query 412, e.g., answers to one or more medical questions posed in the query 412.
  • the search results 420 can provide one or more historical medical conditions for subject A in response to the query 412 asking for one or more historical medical conditions for subject A.
  • the language model 422 generates the query result 424 using one or more large language models.
  • the language model 422 can obtain the search results 420 and summarize the results as the query result 424.
  • the query result 424 can be used, e.g., as output on a device for a user that provided the query 412 or as data for subsequent processes.
  • the query result 424 can be used for processing in the system 100.
  • the data capture engine 104 can use the system 400 to obtain specific data for specific subjects, e.g., using historically obtained and processed medical reports.
  • system 400 can be configured to process non-medical report data.
  • system 400 can process data that is related to a given subject. The data can then be stored for use in query results or to be used in further processing, e.g., to predict a presence of a medical condition, such as in the system 100 processes.
  • the embedding engine 404 and the query embedding engine 414 include a pre-trained language model. In some cases, the embedding engine 404 and the query embedding engine 414 include the same embedding process, e.g., an embedding process of the language model 422. However, instead of generating output using solely the language model 422, the system 400 can use the search engine 418 to generate specific medical information to improve results from the language model 422. The language model 422 can use trained elements that simulate language understanding and context to interpret the search results 420.
  • FIG. 5 shows additional elements of an example condition identifier engine.
  • FIG. 5 shows additional details that can optionally be used in the system 100.
  • FIG. 5 shows the condition identifier engine 108 with additional elements that include a text data encoder 502, an image data encoder 504, a modal fusion engine 506, and a decoder engine 508.
  • the condition identifier engine 108 can generate output that includes the identified medical condition 110 and the treatment actions 112.
  • the text data encoder 502 generates one or more embeddings from text included in the multi-modal data 106, such as the questionnaire data. Embeddings can be generated in a similar manner to the embeddings shown and discussed in reference to the system 400 of FIG. 4. In some cases, the text data encoder 502 can obtain text data, e.g., from the questionnaire data or medical report data, and generate a vector of numbers. The vector of numbers can help to identify that data for downstream processing.
  • condition identifier engine 108 includes additional data encoders for other data modalities, such as the image data encoder 504. Although shown with only two encoders, the condition identifier engine 108 can include additional or fewer encoders in some cases.
  • the image data encoder 504 can be configured to obtain image data, such as image data from the multi-modal data 106 that represents a body part of a subject, such as a face of the subject, and generate an embedding.
  • the embedding can be a set of values that help to indicate a particular image or set of images.
  • the modal fusion engine 506 fuses data from one or more encoders.
  • the modal fusion engine 506 can fuse data from the text data encoder 502 and the image data encoder 504.
  • the modal fusion engine 506 fuses data using one or more concatenation operations.
  • the modal fusion engine 506 can concatenate one or more embeddings, from one or more encoders, to generate a fused embedding to provide to the decoder engine 508.
  • the modal fusion engine 506 includes an encoder model, such as an encoder model that can be included in the text data encoder 502 and the image data encoder 504, that is configured to encode data from one or more encoders into a single output, such as a single embedding.
  • the single embedding can be different from a concatenated embedding, e.g., it may be a compressed version of such a concatenated embedding or a new embedding, such as a new embedding that maps the obtained encoder data into a new feature space that is either the same or different dimension compared to the data from the one or more encoders.
  • the decoder engine 508 obtains output data from the modal fusion engine 506.
  • the decoder engine 508 can be configured to decode output data from the modal fusion engine 506 into outputs, such as one or more of the identified medical condition 110, the treatment actions 112, or a combination of these among others.
  • the decoder engine 508 can be trained separately or jointly with one or more elements of the condition identifier engine 108.
  • the decoder engine 508 can obtain data from the modal fusion engine 506 indicating training input data, where the training input data corresponds to ground truth data indicating one or more outputs, such as an identified medical condition or treatment actions.
  • the decoder engine 508 can generate estimated output.
  • a system such as the system 100 of FIG.
  • the system can generate a difference value indicating a difference between the estimated output of the decoder engine 508 and ground truth data. Based on the difference value, the system can update one or more weights or parameters of one or more models of the condition identifier engine 108. For example, the system can update one or more weights or parameters of one or more of the encoders, the modal fusion engine 506, the decoder engine 508, or a combination of these among others.
  • FIG. 6 shows an example subgraph process for risk probability predictions.
  • FIG. 6 shows a collection of nodes representing subjects in a multi-dimensional space. For ease of illustration, the multi-dimensional space is depicted in two-dimensions. There are three clusters of nodes, including a first cluster 602, a second cluster 604, and a third cluster 606. Example nodes 604a and 606a are shown in the second and third clusters represented by the multi-dimensional vectors 604b and 606b, respectively.
  • the multi-dimensional vectors can represent various features of different subjects.
  • the nodes represent various features of different subjects, such as medical conditions, biographic information, ethnicity, or a combination of these among others.
  • nodes that are closer to one another represent subjects that are more medically similar to one another and nodes within the same cluster represent subjects that are more medically similar to one another than nodes representing subjects in different clusters.
  • a first embedding engine 610 and a second embedding engine 612 can be used to generate output data using one or more values from the nodes.
  • the first embedding engine 610 and the second embedding engine 612 can be part of a subgraph neural network that serves as a multi-class classifier.
  • the subgraph neural network can identify comorbidity risks faced by ethnic minorities, the socioeconomically disadvantaged, subjects in underserved communities, or a combination of these among others.
  • the first embedding engine 610 identifies feature embeddings using the nodes. For example, the first embedding engine 610 can identify feature embeddings on a holistic graph. In some cases, the embeddings can include values that represent a nodes position in a multi-dimensional space. The first embedding engine 610 can generate embeddings using data from one or more subjects. In some cases, the first embedding engine 610 is trained using a decoder to try and recreate input data based on an intermediate embedding generated by the first embedding engine 610 operating on input data.
  • the second embedding engine 612 generates one or more embeddings using a more limited context compared to the first embedding engine 610.
  • a more limited context can correspond to (a) similarity in ethnicity, age, or gender, (b) similarity in socioeconomic status, (c) geographic proximity, among other similarities, e.g., indicated in a profile data.
  • the first embedding engine 610 can generate an embedding for a given subject using a context of a multi-dimensional space that includes all subjects
  • the second embedding engine 612 can generate one or more embeddings using a context of a cluster, such as the first cluster 602.
  • the second embedding engine 612 can be trained to generate embeddings, e.g., based on one or more first embeddings generated using the first embedding engine 610.
  • embeddings from the first embedding engine 610 and the second embedding engine 612 are concatenated, e.g., by an operating system, such as the system 100.
  • embeddings from the first embedding engine 610 and the second embedding engine 612 can be concatenated and provided to machine learning model 614.
  • the machine learning model 614 includes a set of fully connected layers.
  • each layer can be followed by batch normalization layers and activation functions (ReLu).
  • One or more layers can be provided to a softmax layer.
  • Output of the softmax layer can include output indicating the risk probabilities 616.
  • the risk probabilities 616 can indicate risk for one or more medical conditions for a subject, a group of subjects, a specific ethnic community, or a combination of these among others.
  • each node in FIG. 6 represents a patient.
  • Each node can include a numerical feature vector.
  • a vector can include attributes like encoded zipcode, age, encoded gender, encoded ethnicity, income range, condition risk score, such as OSA risk score, or a combination of these, among others.
  • Edges can be created based on similarity measures, such as geographic proximity or demographic similarity, with weights reflecting a strength of relationships between nodes.
  • Ground truth labels or scores representing comorbidity risks can be assigned to each patient.
  • This structured data including one or more of node features, edge indices, edge weights, and ground truth, can be then combined into a data object for a graph neural network (GNN).
  • GNN graph neural network
  • Comorbidity risk ground truth can be represented by one or more values, e.g., ⁇ 100, 0, 50, 0, 0, 0 ⁇ representing, respectively, ⁇ Heart disease, diabetes, hypertension, GERD, depression, Obesity ⁇ .
  • a GNN can include layers such as Graph Convolution layers (GCN) and attention layers. Each layer can have their own initialized weights.
  • a GNN can compute a holistic and subgraph-based embeddings, e.g., through a series of propagation and aggregation steps. Initially, each node’s feature vector can be updated by aggregating information from neighboring nodes, weighted by edge weights, using methods such as mean, sum, or max pooling, among others.
  • a holistic embedding can be identified that encompasses information from an expanded neighborhood.
  • a subgraph-based embedding can be generated using a neighborhood of nodes (e.g., within a threshold distance in a vector space).
  • a subgraph-based embedding can include identifying a node neighborhood to aggregate information, e.g., based data representing geographic regions or ethnic groups.
  • a GNN can update node features within a subgraph context to produce localized embeddings.
  • a combined holistic and subgraph embeddings can be used to predict comorbidity risks, e.g., by feeding them into a classifier, such as a multi-layer perceptron (MLP).
  • MLP multi-layer perceptron
  • the classifier can use these embeddings to learn complex patterns and relationships, e.g., between patient attributes and comorbidity risks.
  • a model can minimize a loss function, such as cross-entropy loss, for classification tasks or mean squared error for regression tasks, through techniques such as backpropagation and weight updates.
  • a model can accurately predict comorbidity risks for new patients by leveraging a comprehensive and localized information captured in one or more embeddings. This can help enable healthcare providers to identify high-risk individuals and tailor interventions more effectively, ultimately improving patient outcomes and resource allocation.
  • FIG. 7 is a flowchart of an example process 700 for medical condition identification.
  • the process 700 will be described as being performed by a system of one or more computers, located in one or more locations, and programmed appropriately in accordance with this specification.
  • a medical diagnostics system e.g., the system 100 of FIG. 1, appropriately programmed, can perform the process 700.
  • the process 700 includes obtaining visual data representing at least one body part of a subject (702).
  • the data capture engine 104 of the system 100 can capture an image of a face of a subject.
  • the process 700 obtaining non-visual data that corresponds to one or more biological characteristics of the subject (704).
  • the data capture engine 104 of the system 100 can capture audio or questionnaire data to generate the multi-modal data 106.
  • the process 700 providing data representing (i) the visual data and (ii) the non-visual data to one or more machine learning models (706).
  • the one or more machine learning models can be trained using a set of training data to predict presence of a medical condition.
  • the one or more machine learning models can be included in the condition identifier engine 108 of the system 100.
  • the process 700 determining presence of the medical condition for the subject using output of the one or more machine learning models (708).
  • the output of the one or more machine learning models can be generated based on the one or more machine learning models processing the visual data and the non-visual data.
  • the condition identifier engine 108 can generate output including the identified medical condition 110. In some cases, the condition identifier engine 108 generates output including the treatment actions 112 — e.g., for setting up an appointment or providing feedback to a subject or medical professional.
  • output indicating a presence of the medical condition can be used for clinical decision support.
  • the system 100 can provide output to a medical profession to help the professional diagnosis one or more patients using data obtained for the one or more patients.
  • FIG. 8 is a flowchart of an example process 800 for a medical condition identification platform.
  • the process 800 will be described as being performed by a system of one or more computers, located in one or more locations, and programmed appropriately in accordance with this specification.
  • a medical diagnostics system e.g., the system 100 of FIG. 1, appropriately programmed, can perform the process 800.
  • the process 800 includes receiving an input selecting a medical condition from a plurality of medical conditions available for diagnosis (802).
  • a graphical user interface can be provided to allow a user to select from one or more available medication conditions.
  • the user can be a subject for which a medical condition is to be identified for or a medical professional working to identify one or more medical conditions.
  • the process 800 includes, in response to receiving the input, accessing stored data corresponding to the plurality of medical conditions and a plurality of machine learning models that are configured to determine presence of the plurality of medical conditions (804).
  • a platform such as one or more devices performing operations described in reference to FIG. 1, can store data corresponding to a plurality of medical conditions, such as OSA, heart disease, stroke, or other medical conditions.
  • the platform can store a plurality of machine learning models that are configured to determine presence of the plurality of medical conditions.
  • the system 100 can select elements of the condition identifier engine 108 based on the task of detecting the presence of OSA.
  • the system 100 stores a plurality of models, at least one of which is configured to detect a presence of OSA.
  • the system 100 can configure the appropriate stored model for detection based on a detection selection by a user, e.g., configuring an OSA model in response to a user selecting OSA as a medical condition to be detected.
  • the process 800 includes obtaining from the stored data, using the received input, data corresponding to the selected medical condition and at least one trained machine learning model that is configured to determine presence of the selected medical condition (806).
  • a system such as the system 100, can obtain data indicating a machine learning model.
  • the data can include data for processing by the condition identifier engine 108, such as data of one or more models, such as data indicating parameter values, neural network layers, or a combination of these among others.
  • the process 800 includes providing medical data representing a subject to the at least one trained machine learning model (808).
  • a system such as the system 100, can obtain data, e.g., using the data capture engine 104.
  • Data captured can include medical data representing a subject.
  • the data can be provided to a trained model, such as one or more trained models of the condition identifier engine 108.
  • the process 800 includes determining presence of the selected medical condition for the subject using output of the at least one trained machine learning model (810).
  • the output of the at least one trained machine learning model is generated based on the at least one trained machine learning model processing the medical data.
  • the condition identifier engine 108 can process data captured by the data capture engine 104 to generate the identified medical condition 110.
  • the systems described include a continual learning based multimodal deep learning system.
  • the learning system can be designed for high capacity amidst conditional computations serving the purpose of medical condition screening, such as screening for OSA.
  • a machine learning model of a system such as the system 100, can be trained to determine intermodal correlations, preserve per-modal specificity, and develop an understanding of both symptomatic and anatomical signatures of obstructive sleep apnea.
  • Systems described, such as the system 100 can handle subtle variations to input data (which can be expected in datasets from across multiple clinics or data sources), for instance in the form of sleep questionnaires with subtle revisions to questions, images from the anterior cervical region, profile images of the cranio-facial complex, among others; (iii) One or more models of the system 100 can be trained to perform nuanced tasks such as treatment plans with incremental training and data annotations, e.g., from physician diagnostic codes.
  • Some traditional deep learning systems are monolithic and can suffer from catastrophic forgetting when tasked with incremental learning. Further, some systems face dense activations and hence high-capacity systems can typically suffer from computational bottlenecks.
  • the techniques described can help avoid these limitations.
  • techniques can include using systems with deep learning architecture and one or more transformers.
  • Transformers can be modality agnostic and include multi-head self-attention mechanism that efficiently captures dependencies and relationships within input data.
  • a system can include one or more expert layers, e.g., that sit atop transformer encoder blocks, that leverage expertise of one or more specialized sub-models to handle different aspects of input data. Expert models can themselves be separate transformers or CNNs or ResNets.
  • a system can include an application of softmax or sigmoid-based gating mechanisms that assign weights to individual expert models and perform expert fusion. The fused output can be provided to subsequent layers to generate a final output.
  • a system can be trained using one or more loss functions.
  • data such as heterogenous data
  • processes performed by described systems, such as the system 100 can include tokenization and data embedding strategies for text and image-based input data streams, e.g., obtained from multiple clinics.
  • Techniques can include data augmentation, data alignment, synchronization, or a combination of these among others.
  • a system can generate standardized ground-truth labels, e.g., using one or more physician diagnostic codes.
  • train, test, or validation data sets are identified. For example, nonoverlapping data sets that are representative of real-world scenarios can be identified for use in training, testing, or validation. In some cases, data sets that are representative of real-world scenarios can be included in training data sets and data sets with missing data or noisy data can be included in the test and validation data sets.
  • techniques include regression testing.
  • techniques can include iteratively validating performance and generalization of the system using representative datasets.
  • Techniques can include identifying potential biases, overfitting, model complexity, or a combination of these among others.
  • a system can include hyperparameter tuning, e.g., by deploying a cluster of servers to obtain an initial operating point.
  • the described techniques allow for screening large populations for medical conditions such as Obstructive Sleep Apnea (OSA) or other conditions.
  • the described techniques allow for creating Clinical Decision Support (CDS) reports that help expedite physician’s review of a patient.
  • the CDS report can include output from a subject screening and can provide supplementary information in the form of recommended interventions or comorbid risk conditions the subject may be exposed to given their age, gender, ethnic background, or other factors.
  • the described techniques allow for developing deep learning models that cater to specific populations. Techniques can include data capture using a smartphone with a camera and internet connectivity. Techniques can be performed as a part of a remote tele-health visit.
  • Techniques can help make screen or diagnosis for medical conditions, such as sleep apnea, accessible, especially for people far from existing diagnosis or screening professionals.
  • the techniques can enable personalized screening or diagnosis that can help with early detection of medical conditions, e.g., by combining subjective user reported data with objective biological data, such as images obtained of a subject.
  • Techniques can include providing a personalized recommendation to medical professionals with decision support on optimal treatment, such as oral appliance therapy through a dentist or sleep dentist, CPAP therapy, prescription medication, lifestyle changes, surgical intervention, e.g., stimulation or mandibular advancement, or a combination of these among others.
  • the subject matter and the actions and operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • the subject matter and the actions and operations described in this specification can be implemented as or in one or more computer programs, e.g., one or more modules of computer program instructions, encoded on a computer program carrier, for execution by, or to control the operation of, data processing apparatus.
  • the carrier can be a tangible non-transitory computer storage medium.
  • the carrier can be an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • the computer storage medium can be or be part of a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • a computer storage medium is not a propagated signal.
  • data processing apparatus encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • Data processing apparatus can include special-purpose logic circuitry, e.g., an FPGA (field programmable gate array), an ASIC (application-specific integrated circuit), or a GPU (graphics processing unit).
  • the apparatus can also include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a computer program can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a stand-alone program, e.g., as an app, or as a module, component, engine, subroutine, or other unit suitable for executing in a computing environment, which environment may include one or more computers interconnected by a data communication network in one or more locations.
  • engine is used broadly to refer to a software-based system, subsystem, or process that is programmed to perform one or more specific functions.
  • an engine will be implemented as one or more software modules or components, installed on one or more computers in one or more locations. In some cases, one or more computers will be dedicated to a particular engine; in other cases, multiple engines can be installed and running on the same computer or computers.
  • a computer program may, but need not, correspond to a file in a file system.
  • a computer program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code.
  • the processes and logic flows described in this specification can be performed by one or more computers executing one or more computer programs to perform operations by operating on input data and generating output.
  • the processes and logic flows can also be performed by special-purpose logic circuitry, e.g., an FPGA, an ASIC, or a GPU, or by a combination of special-purpose logic circuitry and one or more programmed computers.
  • Computers suitable for the execution of a computer program can be based on general or special-purpose microprocessors or both, or any other kind of central processing unit.
  • a central processing unit will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a central processing unit for executing instructions and one or more memory devices for storing instructions and data.
  • the central processing unit and the memory can be supplemented by, or incorporated in, special-purpose logic circuitry.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices, and be configured to receive data from or transfer data to the mass storage devices.
  • the mass storage devices can be, for example, magnetic, magneto-optical, or optical disks, or solid state drives.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
  • PDA personal digital assistant
  • GPS Global Positioning System
  • USB universal serial bus
  • the subject matter described in this specification can be implemented on one or more computers having, or configured to communicate with, a display device, e.g., a LCD (liquid crystal display) monitor, or a virtual- reality (VR) or augmented-reality (AR) display, for displaying information to the user, and an input device by which the user can provide input to the computer, e.g., a keyboard and a pointing device, e.g., a mouse, a trackball or touchpad.
  • a display device e.g., a LCD (liquid crystal display) monitor, or a virtual- reality (VR) or augmented-reality (AR) display
  • VR virtual- reality
  • AR augmented-reality
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s device in response to requests received from the web browser, or by interacting with an app running on a user device, e.g., a smartphone or electronic tablet.
  • a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone that is running a messaging application, and receiving responsive messages from the user in return.
  • a system of one or more computers is configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions.
  • That one or more computer programs is configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.
  • That special-purpose logic circuitry is configured to perform particular operations or actions means that the circuitry has electronic logic that performs the operations or actions.
  • the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface, a web browser, or an app through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
  • LAN local area network
  • WAN wide area network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the device, which acts as a client.
  • Data generated at the user device e.g., a result of the user interaction, can be received at the server from the device.
  • the computing system can operate in “privacy preserving mode” or offline mode, e.g., where processing or analysis is done on a client side or inside other devices.
  • a trained model and system libraries can be packaged as a SDK and embedded in an application on a client side to do partial (such as biometrics or other processing) or complete processing on the edge.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Quality & Reliability (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

L'invention concerne des procédés, des systèmes et un appareil, y compris des programmes informatiques codés sur des supports de stockage informatiques, pour une identification d'état médical. L'un des procédés consiste à obtenir des données visuelles représentant au moins une partie du corps d'un sujet ; à obtenir des données non visuelles qui correspondent à une ou plusieurs caractéristiques biologiques du sujet ; à fournir des données représentant (i) les données visuelles et (ii) les données non visuelles à un ou plusieurs modèles d'apprentissage machine, le ou les modèles d'apprentissage machine étant formés pour prédire la présence d'un état médical ; à obtenir une sortie du ou des modèles d'apprentissage machine qui sont générés sur la base du ou des modèles d'apprentissage machine traitant les données visuelles et les données non visuelles ; et à déterminer la présence de l'état médical pour le sujet à l'aide de la sortie du ou des modèles d'apprentissage machine.
PCT/US2024/040376 2023-07-31 2024-07-31 Identification multimodale d'état médical Pending WO2025029914A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363516647P 2023-07-31 2023-07-31
US63/516,647 2023-07-31

Publications (1)

Publication Number Publication Date
WO2025029914A1 true WO2025029914A1 (fr) 2025-02-06

Family

ID=94395993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/040376 Pending WO2025029914A1 (fr) 2023-07-31 2024-07-31 Identification multimodale d'état médical

Country Status (1)

Country Link
WO (1) WO2025029914A1 (fr)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150161331A1 (en) * 2013-12-04 2015-06-11 Mark Oleynik Computational medical treatment plan method and system with mass medical analysis
US20150202395A1 (en) * 2012-08-22 2015-07-23 Resmed Paris Sas Breathing assistance system with speech detection
US20160180587A1 (en) * 2013-03-15 2016-06-23 Honeywell International Inc. Virtual mask fitting system
US20170345155A1 (en) * 2008-02-14 2017-11-30 The Penn State Research Foundation Medical image reporting system and method
US20190209405A1 (en) * 2018-01-05 2019-07-11 Sleep Number Corporation Bed having physiological event detecting feature
US20210007606A1 (en) * 2019-07-10 2021-01-14 Compal Electronics, Inc. Method of and imaging system for clinical sign detection
US20210217167A1 (en) * 2018-05-29 2021-07-15 The General Hospital Corporation System and method for analyzing medical images to detect and classify a medical condition using machine-learning and a case pertinent radiology atlas
US20210327585A1 (en) * 2020-04-02 2021-10-21 Blue Eye Soft, Inc. Systems and methods for transfer-to-transfer learning-based training of a machine learning model for detecting medical conditions
US20210398655A1 (en) * 2020-06-19 2021-12-23 Neil Reza Shadbeh Evans Machine learning algorithms for detecting medical conditions, related systems, and related methods
US20230080929A1 (en) * 2020-03-03 2023-03-16 Ella H. LIN Systems and methods for connecting service providers to clients

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170345155A1 (en) * 2008-02-14 2017-11-30 The Penn State Research Foundation Medical image reporting system and method
US20150202395A1 (en) * 2012-08-22 2015-07-23 Resmed Paris Sas Breathing assistance system with speech detection
US20160180587A1 (en) * 2013-03-15 2016-06-23 Honeywell International Inc. Virtual mask fitting system
US20150161331A1 (en) * 2013-12-04 2015-06-11 Mark Oleynik Computational medical treatment plan method and system with mass medical analysis
US20190209405A1 (en) * 2018-01-05 2019-07-11 Sleep Number Corporation Bed having physiological event detecting feature
US20210217167A1 (en) * 2018-05-29 2021-07-15 The General Hospital Corporation System and method for analyzing medical images to detect and classify a medical condition using machine-learning and a case pertinent radiology atlas
US20210007606A1 (en) * 2019-07-10 2021-01-14 Compal Electronics, Inc. Method of and imaging system for clinical sign detection
US20230080929A1 (en) * 2020-03-03 2023-03-16 Ella H. LIN Systems and methods for connecting service providers to clients
US20210327585A1 (en) * 2020-04-02 2021-10-21 Blue Eye Soft, Inc. Systems and methods for transfer-to-transfer learning-based training of a machine learning model for detecting medical conditions
US20210398655A1 (en) * 2020-06-19 2021-12-23 Neil Reza Shadbeh Evans Machine learning algorithms for detecting medical conditions, related systems, and related methods

Similar Documents

Publication Publication Date Title
US12137979B2 (en) Eye system
US11553874B2 (en) Dental image feature detection
US12144594B2 (en) Hearing and monitoring system
US20240029901A1 (en) Systems and Methods to generate a personalized medical summary (PMS) from a practitioner-patient conversation.
CN105393252A (zh) 生理数据采集和分析
Tobon et al. Deep learning in multimedia healthcare applications: a review
Huang et al. Generalized camera-based infant sleep-wake monitoring in nicus: A multi-center clinical trial
CN116469148A (zh) 一种基于面部结构识别的概率预测系统及预测方法
Lu et al. Video-based neonatal pain expression recognition with cross-stream attention
CN112542242A (zh) 数据转换/症状评分
Agarwal et al. Artificial intelligence for Iris-based diagnosis in healthcare
US20230268039A1 (en) Methods, systems, and computer-readable media for decreasing patient processing time in a clinical setting
CN119920484A (zh) 一种智能医疗导诊方法、系统、电子设备和存储介质
US20240170129A1 (en) System and method for generating a neonatal disorder nourishment program
WO2025029914A1 (fr) Identification multimodale d'état médical
CN119339872A (zh) 基于ai的多模态人机协同气道评估方法及系统
Sujatha et al. Smart health care development: Challenges and solutions
Abawajy et al. Empirical investigation of multi-tier ensembles for the detection of cardiac autonomic neuropathy using subsets of the Ewing features
US20250218576A1 (en) Intelligent clinical user interfaces
Barika Sleep Apnoea Detection with Smart Internet of Things Technology
Khan et al. Health Recommender System for Sleep Apnea using Computational Intelligence
Hemrajani et al. Integrating physiological signals for enhanced sleep apnea diagnosis with SleepNet
Al-Dabet et al. Ocular-Induced Abnormal Head Posture: Diagnosis and Missing Data Imputation
Perez Pozuelo Digital phenotyping through multimodal, unobtrusive sensing
CN120895255A (zh) 一种基于虚拟分身的身体数据分析系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24850050

Country of ref document: EP

Kind code of ref document: A1