[go: up one dir, main page]

WO2025160345A1 - Système et procédé d'acquisition de milieux oculaires - Google Patents

Système et procédé d'acquisition de milieux oculaires

Info

Publication number
WO2025160345A1
WO2025160345A1 PCT/US2025/012876 US2025012876W WO2025160345A1 WO 2025160345 A1 WO2025160345 A1 WO 2025160345A1 US 2025012876 W US2025012876 W US 2025012876W WO 2025160345 A1 WO2025160345 A1 WO 2025160345A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
processing unit
patient
image
video stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/012876
Other languages
English (en)
Inventor
Nicolas SOUBRY
Andrew HOSFORD
Michael Leung
Michael Ricci
Neha Nitin MORE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spect Inc
Original Assignee
Spect Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spect Inc filed Critical Spect Inc
Publication of WO2025160345A1 publication Critical patent/WO2025160345A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/10Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
    • A61B3/14Arrangements specially adapted for eye photography
    • A61B3/145Arrangements specially adapted for eye photography by video means
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/0016Operational features thereof
    • A61B3/0025Operational features thereof characterised by electronic signal processing, e.g. eye models
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/10Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
    • A61B3/12Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for looking at the eye fundus, e.g. ophthalmoscopes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/10Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
    • A61B3/12Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for looking at the eye fundus, e.g. ophthalmoscopes
    • A61B3/1216Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions for looking at the eye fundus, e.g. ophthalmoscopes for diagnostics of the iris
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/10Objective types, i.e. instruments for examining the eyes independent of the patients' perceptions or reactions
    • A61B3/14Arrangements specially adapted for eye photography
    • A61B3/15Arrangements specially adapted for eye photography with means for aligning, spacing or blocking spurious reflection ; with means for relaxing
    • A61B3/152Arrangements specially adapted for eye photography with means for aligning, spacing or blocking spurious reflection ; with means for relaxing for aligning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/67ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B3/00Apparatus for testing the eyes; Instruments for examining the eyes
    • A61B3/0016Operational features thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body

Definitions

  • the present invention relates to a novel method of capturing ocular media of a patient of sufficient quality to permit medical diagnosis. It involves a system capable of capturing media, in combination with any variety and combination of sensors and feedback mechanisms and devices that provide feedback to a user to guide them to reliably collect high-quality media. It also involves a process whereby a user and/or patient is guided to capture the desired ocular media, and that media is made available for review by a physician or certified professional.
  • Eye care specialist such as an optometrist or ophthalmologist
  • Every year to get their retina examined however, very few of them actually get the screening done.
  • the ability to screen for retinal diseases at the primary care site would allow for large cost savings by detecting eye diseases early and implementing any interventions thereby alleviating dependence on higher-cost specialists for basic vision diagnostics.
  • Retinal examinations can be separated into a data acquisition portion, that is performed at the patient site, and a data interpretation portion, that can be performed for example, by remote specialists with or without the aid of machine learning (artificial intelligence).
  • media i.e., an image
  • the device is operated by a local user who receives guidance from a system that is able to view and interpret the data captured in real-time.
  • a user is guided by cues or signals, such as auditory instructions, visual indicators, and/or haptic feedback.
  • the algorithm initiates the capture of the media, whereas in other cases, a user initiates the capture.
  • the algorithm may determine if the media is suitable, and the media may be stored locally or in cloud services where it can then be retrieved for interpretation at a later point in time.
  • this invention we disclose a method and a system that is able to guide a user in real-time on how to position the device in order to capture good ocular media.
  • the method and system may be able to assess the suitability of the media captured, and provide additional steps or guidance to a user to capture better media.
  • FIG. 1 is a block diagram illustrating an overview of a guidance system
  • FIG. 2 is a block diagram showing how multiple inputs can be used to train an instruction model (Logic Algorithm used in FIG. 5) ;
  • FIG. 3 is a block diagram showing possible data flow for determining the best human interpretable instructions to give to a user
  • FIG. 4 is a high-level view of an algorithm that determines the best interpretable instructions to give a user;
  • FIG. 5 is a detailed view of an algorithm that determines the best interpretable instructions to give a user;
  • FIG. 6 is an exemplary method of training of object detection model
  • FIGs. 7A and 7B exemplary method of training of a classification model, and prediction process of a classification model
  • FIG. 8 is a chart depicting an example of how to divide up the screening process into various phases
  • FIG. 9 is an example flowchart of media capture
  • FIG. 10 is an example breakup of instructions to a user and/or patient that can be given by the guidance system
  • FIGs. 11A-1 IE is an example showing how to find the focus region by using differential “cliffs” for each column/row;
  • FIG. 12 is a diagram showing “orders” of blood vessels.
  • the “1st order” blood vessel is the arcade (labeled with a “1”).
  • 2nd order blood vessels are the 2nd branch from a blood vessel (labeled with a “2”.
  • Blood vessels are considered 3rd order clarity if you can see 3 branches from a single blood vessel (labeled with a “3”); and
  • FIG. 13 is an example flowchart showing how instructions can be given to the user to guide the user to achieve predefined criteria.
  • optical can refer to the retina, the fundus, optic disc, macula, iris, pupil, or other eye-related anatomical components.
  • media refers to an image frame, photo, video, audio, signals, sensor data, or other digital information.
  • Patient is a person or animal whose ocular features are being examined.
  • User is the person who is handling and guiding the device relative to the patient. The user may be the patient, or the user may be another person handling the device.
  • Specialist refers to someone or something that may be able to read and diagnose eye and retinal images and/or media, such as an optometrist, ophthalmologist, retinal specialist, or algorithm.
  • firmware and software encompasses firmware.
  • human interpretable instructions can refer to synthesized speech instructions, pre-recorded speech instructions, auditory beeps, flashing lights, vibrations or the like.
  • the device refers to a media-capturing (i.e., image capturing) device, such as an ophthalmoscope, retinal camera, or fundus camera.
  • the device may include a communication-enabled device and an optical hardware device.
  • the communication- enabled device is a device, such as a smartphone that has access to modules, such as a camera, a speaker, a microphone, a vibration or haptic engine, and is capable of transmitting information through various wireless methods such as WiFi, Bluetooth, cellular/Long Term Evolution (LTE), or wired methods such as a Universal Serial Bus (USB), Universal Asynchronous Receiver/Transmitter (UART), Universal Synchronous/Asynchronous Receiver/Transmitter (US ART)... etc.
  • USB Universal Serial Bus
  • UART Universal Asynchronous Receiver/Transmitter
  • USB ART Universal Synchronous/Asynchronous Receiver/Transmitter
  • the optical hardware device is a device that consists of one or more optical lenses configured in such a manner to enable the camera to focus and capture images and videos.
  • a camera may be located on the communication-enabled device, or as a separate module that communicates with the device. In some configurations, the communication-enabled device and optical hardware device may be the same device.
  • storage media refers to any form of temporary or permanent electronic or non-electronic storage of data e.g. RAM, ROM, hard disc, firmware, EPROM, SSD, etc.
  • a user refers to the person holding and operating the device.
  • the patient is the person or animal of whom we wish to capture ocular media.
  • a user and the patient may be two separate living beings, or may be the same person.
  • a plurality of sensors are attached to the device, and computational capabilities receive input from the sensors and compute instructions to a user on how to move or position the device or how the patient should move or position the patient’s body or body parts.
  • computational capabilities receive input from the sensors and compute instructions to a user on how to move or position the device or how the patient should move or position the patient’s body or body parts.
  • the device can be any device with sufficient optical capabilities for imaging the eye and/or retina, such as devices in conformance with ISO 10940 or its equivalents, which contain one or more sensors and computational capabilities to analyze input from the sensors.
  • the device may be a hand-held, table-top, or kiosk ophthalmic camera.
  • the device could be attached to a harness that the patient wears on their chest, hips, back, arms, head, neck, or shoulders, etc.
  • the device could also be embedded into a VR headset.
  • the device could be attached to an external object such as a drone, stand, kiosk, etc.
  • the device When a patient wishes to get their retina examined, they can find a user in possession of the device available to perform the procedure.
  • the patient’s eye(s) may or may not be dilated.
  • the device may be used in a dark room to allow for natural dilation, or the device may have an enclosure around the patient’s eyes to create darkness, thereby allowing for natural dilation of the patient’s eyes.
  • This enclosure may need to touch the patient around their eyes sufficiently to significantly reduce the presence of external visible light entering the enclosure.
  • This enclosure could be around a single eye individually, or it could be around both eyes at once. If individually, the exam would occur one eye at a time. If the enclosure covers both eyes, both eyes could be examined at once or one eye at a time.
  • any of the plurality of sensors inside the enclosure could be used to detect target areas of the patient such that appropriate instructions could be given to a user regarding how to position the device.
  • Some sensors may use light in the far red wavelength (such as radar. .. etc.), infrared, ultraviolet, or any wavelength of light not visible to the unaided human eye could be used in combination with a sensor capable of detecting the same light (far red, infrared, ultraviolet, x- ray .. . etc.).
  • ultrasonic or subsonic sound sensors may be used to detect the location of target areas of the patient.
  • any combination of far red, infrared, ultraviolet, or sonic sensors may be used to detect the location of target areas of the patient.
  • Many target areas of the patient may be detected. These include the regions around the eye, and inside the eye. Examples of target areas around the eye to detect would be eye socket(s), eye(s), eyelids, the nose, eyebrows, eyelashes, cornea, the iris, etc. Examples of target areas inside an eye would be the iris, the retina, the lens, the fovea, the optic nerve (disc), the macula, the blood vessels including but not limited to inferior arcade and superior arcade, etc.
  • any number of visible light sources (such as incandescent, halogen, fluorescent, LED, laser, etc.) in combination with a visible-light sensor can be used to view target areas of the patient.
  • visible-light sensor such as incandescent, halogen, fluorescent, LED, laser, etc.
  • ultrasonic or subsonic sound sensors may be used to detect target areas of the patient.
  • any combination of far red, infrared, visible light, ultraviolet, or sound sensors may be used to view the patient’s body parts.
  • These light sources can illuminate the region around the eye, or they can illuminate the eye directly, including being shined directly into the eye. Any visible light shown directly into the eye may conform with a light hazard protection standard such as ANSI Z80.36 or its equivalents.
  • the device utilizes visible light sources, i.e., light which is visible to the human eye.
  • the device utilizes light sources in wavelengths not visible to the human eye.
  • it may utilize any combination of light sources, whether visible to the human eye, not visible to the human eye, or both.
  • the target areas of the patient may be detectable to various sensors through various means.
  • a distance sensor may be used to detect the distance between the device and a target area of the patient.
  • the distance sensor may use any form of light (far red, infrared, visible, ultraviolet, etc.) or sound (subsonic, sonic, ultrasonic, etc.) to achieve its range-finding capabilities.
  • the distance sensor would detect only the distance to the area directly in front of it.
  • the distance sensor would be an image sensor array such as a Complementary Metal-Oxide Semiconductor light sensitive circuit (CMOS), Charge- Coupled Device light sensitive circuit (CCD), etc.) or a sound sensor array.
  • CMOS Complementary Metal-Oxide Semiconductor light sensitive circuit
  • CCD Charge- Coupled Device light sensitive circuit
  • a touch sensor such as a physical electric switch, or a capacitive touch sensor, could be used to detect the immediate presence of a patient’s skin, cornea, eyelash, or other area.
  • a moisture sensor could detect the presence of skin via detection of sweat, or the cornea via detection of fluid on the cornea.
  • a force sensor such as an accelerometer could detect movement of the device in any direction, or a tension/compression sensor could detect contact with a patient, user, or other object.
  • a directional position of the device could be detected using a magnetometer, to detect either the earth’s magnetic fields, or an induced magnetic field.
  • temperature sensors could be used to detect the proximity of a heat source such as a patient, a user, or an induced heat source positioned so as to provide further guidance.
  • a positional sensor such as a Global Positioning System (GPS) unit could be used to determine the position of the device relative to the patient.
  • GPS Global Positioning System
  • a camera light sensor array CMOS, CCD, etc.
  • This visual sensor array could detect far-red light, infrared, visible light, ultraviolet, etc. or any combination of light spectrums, such as with a multispectral camera.
  • any combination of camera sensor(s), light sensor(s), distance sensor(s), touch sensor(s), force sensor(s), magnetic sensor(s), temperature sensor(s), positional sensors, and/or moisture sensor(s) could be used to detect the relative position of the device with respect to the patient. These sensors could touch the patient or could not touch the patient.
  • the sensors could be placed outside or inside the device, and they could be placed in any arrangement at any angle so as to optimize the ability to detect the patient.
  • the sensor(s) could be in communication with a plurality of microprocessors in communication with each other or attached to a single microprocessor.
  • the microprocessor would then analyze the input from these sensors to determine an instruction regarding how to position, move, or operate the device. Instructions regarding how to move, operate, or position the device and/or patient would be delivered to a user and/or patient or to a system through one or more guidance systems.
  • instructions from the guidance system could be communicated to a system of actuators (electrical, air, chemical, or hydraulic) in communication with the camera(s) so as to move the camera(s) into position that will allow for appropriate media to be captured of the patient’s eye.
  • the guidance system analyzes the camera feed from one camera, and moves a camera along at least 2 axes until the camera is positioned so as to clearly see the retina of the patient.
  • the guidance system analyzes the camera feed from two or more cameras, and moves the primary camera along at least 2 axes until the camera is positioned so as to collect media of the desired region of the eye, while the other camera(s) are positioned around the primary camera to observe the area around the primary camera and between the primary camera and the patient and thereby provide additional information needed.
  • Camera movement can be achieved using electrical, pneumatic and/or hydraulic actuators.
  • distance, positional, and force sensors and/or additional camera(s) are used by the guidance system to determine how to actuate the primary camera.
  • actuators move the light source that illuminates the area to be collected.
  • actuators move the primary camera as well as any number of sensors or illumination sources in communication with the guidance system.
  • actuators move a substance delivery tool so as to administer a substance (such as a mydriatic) to aid in the collection of eye media.
  • media is collected using multiple cameras at various angles instead of just from the primary camera.
  • the user may turn on (activate) the device.
  • the device may be wirelessly connected (automatically or with the aid of the user) to a remote or local support system via the internet.
  • the user is connected with a guidance system on the device.
  • the guidance system may be located on a remote server.
  • the user may additionally be connected to a remote secondary advisory user (described in WO2020/214612 entitled “DEVICE NAVIGATION AND CAPTURE OF MEDIA DATA” which was filed on April 14, 2020.
  • the device Upon activation by the user, commences a communication pathway between the user and the guidance system.
  • the communication pathway is one-way from the guidance system to the user.
  • the communication is two-way between the user and the guidance system.
  • the communication is a multi-way communication pathway between the user, the guidance system, and potentially a secondary advisory user.
  • the guidance system is the secondary user.
  • the guidance system is a software application.
  • the guidance system is a circuit board.
  • any permutation of communication directionality and inclusion or exclusion of a secondary user is implemented.
  • the guidance system is both a software application and a circuit board.
  • the device can communicate with the user, secondary advisory user, or the guidance system via wireless (e.g. Wi-Fi, Bluetooth, Zigbee, cellular network, etc.) or wired communication methods, using any communication protocol (Transport Layer Security (TLS), Transmission Control Protocol (TCP), USART/UART, USB, Serial Peripheral Interface (SPI), Inter-Integrated Circuit (I2C), Controller Area Network (CAN), custom, etc.).
  • wireless e.g. Wi-Fi, Bluetooth, Zigbee, cellular network, etc.
  • wired communication methods e.g. Wi-Fi, Bluetooth, Zigbee, cellular network, etc.
  • TLS Transmission Layer Security
  • TCP Transmission Control Protocol
  • USART/UART USB
  • SPI Serial Peripheral Interface
  • I2C Inter-Integrated Circuit
  • CAN Controller Area Network
  • the guidance system can receive as input video captured by the device and may receive additional information, including but not limited to audio, information, sensor data, such as gyroscopic, distance, touch, force, magnetic, temperature, light, positional, moisture, voltage, additional cameras, eye side, etc.
  • Sensor data can be located within the device, such as battery charge status, device operation mode, position and proximity sensors, light sensors and power meters, humidity readings, etc., or external to the device as communicated either through wired or wireless means.
  • the guidance system returns (or outputs) information back to the user.
  • the information can include auditory information, visual information, and/or haptic information and instructions.
  • auditory information can include pre-recorded messages verbally telling the user instructions. For example, “move up” “move closer,” etc. Auditory information can also be pings, such as sonar noises that vary in volume, pitch, and/or frequency.
  • the audio commands are whatever the secondary user tells the user or the patient to do so that ocular media can be collected. These auditory commands can and often include commands like “move up,” “move closer,” etc.
  • FIG. 10 describes examples of the kinds of instructions delivered by the guidance system.
  • the guidance system may give any combination of instructions to a user and/or to the patient. As described in FIG. 10, these instructions include moving the device in the six cardinal directions (up, down, left, right, closer and further); rotating the device (pitch, roll and/or yaw) or giving the patient instructions on how to move their eye, such as looking higher, lower, looking left/right/straight/in the middle, looking at a target in front of the patient such as a light or a display with a movable point to fixate onto.
  • These instructions can be in real-time, while the user is moving the device, or they can be given before or after the user starts to use the device.
  • the instructions can be given audibly, visually, haptically, or any combination or permutation thereof, described below.
  • the audio instructions may be given alone or in conjunction with visual instructions, with haptic instructions, or other types of feedback.
  • the audio instructions could be in English, or in any other language.
  • the audio instructions could be noises or sounds, such as a beep or ping that vary in length, frequency, pitch, and volume.
  • the audio instructions could be given in constant audio, or may have gaps and pauses and occasional audio when needed.
  • the audio instructions may be mixed with other audio, such as music.
  • the audio instructions could be given to the user from the device speakers, headphones connected to the device through physical wire or through wireless connections such as Bluetooth, Wi-Fi, etc., or other audio devices connected wirelessly or wired connected audio devices, such as Bluetooth headphones, Virtual Reality (VR) headphones, external speakers, etc.
  • Haptic information can include vibrations that the user can feel while holding the device without needing to look at or further interact with the device. These vibrations can vary in intensity, frequency, and length.
  • the haptic feedback could be 3 short and light vibrations, 3 short and hard vibrations, 1 long soft vibration, 1 long hard vibration, 1 soft short and 1 hard long vibration in rapid succession, etc.
  • These vibrations can be used to communicate any number of directional instructions such as 2 short soft means to move left, while 1 long soft means to move right, 3 hard long vibrations means to pull back and restart, etc.
  • the vibrations could also be used to communicate that certain mile markers have been reached. For example, 1 soft short vibration to indicate that the retina has been detected, 2 soft short vibrations to indicate that the optic nerve has been detected, 3 long vibrations to indicate that appropriate media has been collected, and the user is done with the current eye or with all imaging.
  • Visual information can include displaying images or video on a screen attached to the device and/or separate from the device, such as arrows indicating the direction to move the device, animations or live video demonstrations showing how to position the device relative to the patient, animations or live video demonstrations showing how to position, prepare, and/or administer drugs to the patient, animations or live video demonstrations showing the patient how to move their eye or other body parts.
  • Visual information could also display graphs to guide the user on the distance or location.
  • Visual information for positioning the device could also include video-game style instruction in response to the user’s motions, such as having the user try to accurately position a dot within a circle by moving the device up/down/left/right/closer/further relative to the patient; or “fly” a simulated airplane through various targets by moving the device up/down/left/right/closer/further.
  • Visual information could include flashes of light that vary in intensity, frequency, or color, and/or other visual cues to give guidance and or feedback to the user and patient.
  • the guidance system could be a software application installed onto a mobile device with at least a touchscreen, camera, speakers, haptic system, microprocessor, transient storage, and persistent storage.
  • the camera feed is analyzed by the guidance system to recognize the patient’s body parts relative to the device, and then audio, visual, and/or haptic feedback is provided to the user regarding how to move the device.
  • audio, visual, and/or haptic feedback is also given to the patient regarding how to move their eyes, eyelids and where to look.
  • the guidance system recognizes appropriate ocular media has been collected by a camera or other sensor, it records the media to storage for later retrieval or delivery.
  • Additional embodiments can include variations wherein features of the above embodiment are not used, such as the haptic system being excluded or the touchscreen is not a touchscreen but a (display only) screen. Or no visual feedback is provided, but rather only audio and haptic. Or any other embodiment where only one modality of feedback is provided or any combination of modalities is used to provide feedback to a user.
  • the guidance system is a hardware circuit board in communication with or installed into the mobile device.
  • Additional embodiments can include variations wherein any number, placement, or combination of additional sensors (distance, touch, moisture, force, magnetic, temperature, light, GPS, etc.) provide input to the guidance system.
  • accelerometer(s) can be added to the device in any number of combinations and orientations, and the input from the accelerometer can smoothed using various high or low pass filters or smoothing algorithms to compute the positional change of the device and/or to compute how the device is oriented (upright, horizontal, vertical, facing down, etc.). The information may be used by the guidance system to give instructions to a user on how to move the device.
  • magnetometer magnetometer
  • sensor(s) could be used to determine the tilt and angle of the device with respect to the Earth.
  • any number of external magnetic field(s) could be placed at a known location(s) and orthogonal directionality(ies) relative to the patient.
  • the guidance system can then use the input from the magnetometer(s) to determine the location and tilt of the device relative to the earth or external magnetic field(s).
  • distance sensor(s) could be placed on the device with known orientation(s) relative to the camera. This(ese) sensor(s) could measure the distance from the device to the patient or the distance from the device to a known object (such as the operator, wall, or other mounted structure close to the patient).
  • the guidance system could be installed as software or a hardware circuit board into a non-mobile ophthalmic camera or kiosk, wherein the device used to collect the media of the eye and surrounding area is movable in its position relative to a patient.
  • the camera feed is analyzed by the guidance system to recognize target areas of the patient relative to the device, and then audio, visual, and/or haptic feedback is provided to a user regarding how to move the camera relative to the patient.
  • audio, visual, and/or haptic feedback may also be given to the patient regarding how to move their eyes, eyelids and where to look, etc.
  • the guidance system recognizes acceptable media, it may automatically record the media for later retrieval or delivery, or it may instruct an external user to record the media.
  • the guidance system gives output to both a display facing a user and another display facing the patient.
  • the display to a user informs a user how to position the device, while the display to the patient informs the patient as to where to look or how to move their body.
  • the patient looks at a fiducial on the display and that fiducial could move up/down/left/right, thereby moving the patient’s eye.
  • both the patient display and a user display are mounted to the same device.
  • the patient display is separate from the device while a user display is mounted to the device.
  • a user display is separate from the device, while the patient display is mounted to the device.
  • both the patient and user displays are mounted separate from the device. Communication between the device and the displays can be either wired or wireless. If separate from the device, the displays may include a dedicated processor to aid in proper display of instructions from the guidance system, or may use the same processor.
  • a fiducial is placed on the patient such as a sticker or a laser pointer light shown on the patient.
  • the guidance system then tracks the device’s positive relative to this fiducial and gives appropriate instructions to guide a user until acceptable media is collected.
  • the guidance system may guide a user through the following series of steps to capture media.
  • a. The patient is asked to look in a fixed direction. For example, look straight ahead.
  • b. A user approaches the device to the correct working distance. The correct working distance varies by device and patient.
  • c. The goal is to position the device at the right location, with the light focus spot located on the Patient’s lens or cornea, with the light illuminating inside the pupil.
  • the guidance system may guide a user into position, by giving a user information on how to optimally position the device to capture the desired media.
  • the guidance system may trigger the device to capture the desired media (images and/or video and/or data from various other inputs).
  • a user may get some feedback from the guidance system to indicate a successful examination has occurred.
  • the media is stored in a location that can be accessed at a later point.
  • CMOS visible light camera
  • CCD visible light camera
  • This media could be in any number of single image formats such as tiff, raw, jpg, png, heic, dicom, etc.
  • this media could also be a multi-image media video such as mp4, mp3, mov, xvid, heic, dicom, etc.
  • This media could be a combination of both single-image and multi-image formats.
  • One or more media samples could be taken of a single patient’s eye, both eyes separately, or both eyes at the same time. These media samples can be fdtered or unfiltered.
  • the filtering could exclude all media that do not include the retina.
  • the exclusion of all camera media that are out of focus.
  • the exclusion of all media where the optic disc or fovea are not in a predefined location.
  • the exclusion of all media where the optic disc or fovea are not a predefined location and are not in focus. Any other combination of filters could be used based on clarity, patient areas present, target area position in the media, camera position, patient position, media quality characteristics, events occurring at the time of capture (move up, down, hold still, look up. . .etc.), etc.
  • the device saves the captured media to a storage location.
  • This could be persistent storage (e.g. hard disc, flash, etc.), transient storage on the device (e.g. Random Access Memory (RAM), etc.). It could be located in removable storage (flash memory, magnetic storage, optical storage, etc.), or it could be located in remote storage (cloud storage provider, local network storage, etc.), each in communication with the device.
  • persistent storage e.g. hard disc, flash, etc.
  • RAM Random Access Memory
  • It could be located in removable storage (flash memory, magnetic storage, optical storage, etc.), or it could be located in remote storage (cloud storage provider, local network storage, etc.), each in communication with the device.
  • the computation described herein of the guidance system may occur locally on the device, on another nearby device, or remotely, such as in the cloud.
  • FIG. 1 and reference number 9 which includes items 3, 4, 5 and 6
  • the purpose of the guidance system is to receive all inputs (reference numbers 1 and 2 in FIG. 1) and process them to provide an output, to a user and/or patient (reference numbers 5 and 6 in FIG. 1), and ultimately the delivery of ocular media (reference numbers 7 and 8 in FIG. 1).
  • One or more algorithms can be used to accomplish the processing of the inputs. These algorithms may be used together to completely automate the process or separately as an aid to a user.
  • FIG. 13 shows an embodiment of FIG. 1 wherein the guidance system receives inputs (reference numbers 1 and 2 in FIG. 13) and processes them to give regular instructions to a user and/or patient (reference numbers 4, 5, 6, and 7 in FIG. 13), to guide the user to achieve predefined criteria (reference numbers 8, and 9 in FIG. 13) and communicate such status to the user (reference number 10 in FIG. 13).
  • the guidance system receives inputs (reference numbers 1 and 2 in FIG. 13) and processes them to give regular instructions to a user and/or patient (reference numbers 4, 5, 6, and 7 in FIG. 13), to guide the user to achieve predefined criteria (reference numbers 8, and 9 in FIG. 13) and communicate such status to the user (reference number 10 in FIG. 13).
  • the input media may be separated into individual images (such as a video into individual video frames).
  • the camera media may undergo any number of appropriate transformations, filters, etc. to prepare the camera media for processing.
  • the camera media could be cropped, scaled, down sampled, binned, reflected, sheared, rotated, scaled, reshaped to a desired aspect ratio, transformed to a different color space (such as greyscale), padded, etc.
  • the camera media may go through morphological, wavelet, gaussian, linear and non-linear transforms, for example to correct, remove, or introduce distortions, such as chromatic aberrations, spherical aberrations, pincushion/barrel/fisheye distortions, and other aberrations.
  • the camera media could undergo zero, one, or a multitude of filters and transforms, such as low pass filter, high pass filter, Fourier transforms, Gaussian, Laplacian, Hough transforms, denoising texturing, edge detection, etc. These transformations and filters may be applied to the entire camera media, or to any part of the camera media in isolation.
  • a part of camera media pre-processing may include removing parts of the camera media that are not necessary for analysis to reduce file size and speed up subsequent steps.
  • many IS010940 cameras utilize spherical lenses to reduce curvature transformation when imaging the retina (which is also spherical), while many camera sensors utilize rectangular photosensitive arrays. This presents a central region in the middle of the camera media which is the focus of analysis (focus region) and is typically circular - though not required to be circular. This focus region can be identified using any number of means.
  • the camera media are cropped to a square, where the width of the camera media is the length of the shortest distance of the sensor, and the height is the same as the width, where the cropped frame is centered on the center of the camera media.
  • the location of the focus region in the camera media is previously known by the device model.
  • an edge-detection algorithm such as Canny, Deriche, Differential, Hough, Sobel, Prewitt, Roberts, etc. is used to find the outside edges of the focus region.
  • differentials are utilized to find two “cliffs” in the color intensity indicating the beginning and end of the focus region for each row or column (FIGs. 11A-1 IE).
  • FIG. 11 A is an example vertical cross-section of the image that is taken to determine differential “cliffs”.
  • FIG. 1 IB is the corresponding pixel intensity for each pixel in the cross-section selected in FIG. 11 A.
  • FIG. 11C. is an example horizontal cross-section of the image that is taken to determine differential “cliffs”.
  • FIG. 1 ID is the corresponding pixel intensity for each pixel in the cross-section selected in FIG. 11C.
  • FIG 1 IE the white circle is the focus region result if you repeat the process described in FIGs. 11A-1 IB and FIGs. 11C-1 ID over each column and each row.
  • template matching is employed to identify the focus region.
  • a deep learning model is trained using labeled data to identify the focus region.
  • the images are pre-processed with any number of color, denoising, noising, smoothing, or other computer-vision transformations before employing any of the previously described focus region detecting methods. Any number of the above-described embodiments can be combined for the purpose of identifying the focus region.
  • the focus region can be used as the input for processing or the input for more pre-processing.
  • the above pre-processing items can be applied to a single camera media or to groups of camera media.
  • the camera media may be reduced to minimize the size of the input camera media to the model. Smaller camera media typically enable the model to analyze the camera media faster. Additionally, camera media may also be increased in size using any number of interpolation techniques such as nearest neighbor, linear, polynomial interpolation, etc., in order to create additional information for the processing step. A certain pattern of camera media may also be removed from the analysis, for example, every other image (or video frame). This reduces the number of images that need to be analyzed, for example, from 30 to 15 images per second.
  • Different enhancements may be used in camera media pre-processing (FIG. 1 or FIG. 13, reference number 3) to help the algorithm in its function.
  • These enhancements include converting the camera media to various colorspaces (e.g. greyscale, red-free, blue-free, green-free, CIELAB, CYMK, YUV, HSV, Bayer variants of the aforementioned, etc.), and/or applying various transformations such as Bayer, Gaussian, Top Hat, Fourier transforms, etc.
  • the camera media is transformed to grayscale by reducing the number of dimensions by combining the various channels in the camera media to a single value.
  • one or more channels from CIELAB space may be used to isolate the feature of the camera media (such as the focus region, retina).
  • Additional inputs can be received from a variety of sensors (distance, touch force, magnetic, temperature, light, positional, etc.) and these sensor inputs can also undergo signal pre-processing (FIG. 1 or FIG. 13, reference number 3) including smoothing and filtering.
  • Filtering can be performed using linear, non-linear, time-variant, time-invariant, causal, and/or discrete time filters. Smoothing can be accomplished using any number of smoothing techniques such as low and high pass filters being used in either or both the forward and backward directions. Both or either smoothing techniques can be used.
  • the input data is received in real-time, and in order to perform backward smoothing/filter operations, the guidance system takes regions of samples either by count (5, 7, 10, 20, etc.
  • the output from the various sensors could be inputted into one or more deep-learned models, which are trained to perform smoothing or any number of pre-processing operations.
  • the processing step can be any series of steps necessary to provide the instructions to a user and/or patient.
  • a single model receives all input and provides the output.
  • one or more models receive input, and their output is collated into another model that gives the final output.
  • the models could also be regression, clustering, biclustering, covariance estimation, composite estimators, cross decomposition, decomposition, gaussian mixture, feature selection Gaussian process, linear, quadratic, discriminant, matrix decomposition, kernel approximation, isotonic regression, manifold learning, or classification models, objectdetection models, support vector machines, Naive Bayes algorithms, k-nearest neighbors algorithm, dimensionality-reducing models, random forest models, decision tree models, tabular data machine learning models, etc. Any of these models can be used singularly, or in any ensemble method.
  • the term label or labelled is defined by a tag used to characterize the media. These labels can be bounding boxes, semantic segmentations, image classifications, keypoints, visual, audio or haptic annotations, etc. For example, an image may be labelled as "up”, while another can be labelled as "not up”.
  • a classification deep-leaming model could determine the acceptability of retinal media.
  • the goal of this model is to remove media within the stream that are not of the retina in general or do not show enough characteristics of the retina for a specialist, such as an ophthalmologist or an acceptability algorithm to properly work.
  • FIG. 7a illustrates how the model may be trained to predict a label regarding the current state of the device.
  • images are labeled into at least two categories such as having the label or not having the label. For example, an image with the retina in it could be labeled as “retina” while an image with no retina in it can be labeled as “not retina.” Labels can be any label and number of labels that are necessary to accomplish the purposes of the device.
  • the label can be a structure in the image (retina, fovea, eyebrow, eyelash, etc.), or an instruction to the user that will allow the device to achieve a predetermined criteria (up, down, left, further, etc.), a characteristic of the image (clear, blurry, 3rd-order-clarity, etc.), or any other label that is necessary.
  • FIG. 7b illustrates how that image can be given as an input to the model and the model outputs the predicted label.
  • a classification model or models may be used to label camera media that fall into different stages (FIG. 7b). The stages could represent various levels of media acceptability.
  • Stage 0 video frames could be frames that do not contain a blood vessel or retina. For example, these frames are mostly black, or show the area outside the patient’s eye including furniture or other aspects of the patient’s environment, etc.
  • Stage 1 frames are frames that do contain a blood vessel.
  • Stage 2 frames have at least 50% of the media that is the retina.
  • Stage 3 frames are frames that contain at least 2 features. Features may include: superior arcade, inferior arcade, optical nerve or the fovea.
  • Stage 4 frames are frames that contain 2nd order blood vessels. 1st order blood vessels are the biggest and thickest blood vessels, like an arcade. 2nd order blood vessels are the blood vessels that branch from those blood vessels. If they branch one more time, then they are considered as 3rd order blood vessels.
  • 3rd order clarity is the presence of 3rd order blood vessels(see FIG. 12). Additional stages may be added or removed depending on the application. Certain stages may include media that are above or below a certain threshold. The threshold can be based on color content, brightness, contrast, sharpness, signal characteristics, algorithmic transforms, or other heuristics.
  • the output from the model may be used in different ways. For example, as a visual aid for a user.
  • the device may display the color red if the frame is Stage 0, yellow if it falls between Stage 1-3 and green if it is a Stage 4.
  • Another instance it can be used to give an indicator to a user on how the media collected appears. Any other type of indicators can be given to a user, such as visual, auditory, or haptics, as previously described herein.
  • Another way that the model may be used is to automatically process the video by removing certain frames. In one instance, all stage 0 frames are removed. With this, we have the possibility of also removing other stages like 0-1, 0-2, 0-3, 1-2, 2-3, etc.
  • a gradability or quality model could be used to determine the acceptability of ocular media.
  • the algorithm may be looking at specific pathology and features in the media.
  • This algorithm may be used as an aid to a person, such as an eye care specialist, an optometrist or ophthalmologist, or can be used to automatically grade images of the eye, such as the retina.
  • This model could be trained by using images, sensor, user, or patient inputted data.
  • the images, sensor, user, or patient inputted data may be labeled by certified specialists, such as ophthalmologists or optometrists, or labeled by other algorithms (e.g. synthetic data) or trained individuals.
  • This model can be trained to include detection of multiple pathologies and retrained to add more pathologies as needed.
  • the output of this model can be a label of individual video frames where a pathology was detected, a specific region of interest on a frame or signal, or an alert and escalation to the exam where a notice that the patient should be referred to a specialist for follow up.
  • the output may include the urgency of the referral, for example, “immediately”, “within the next few days”, “within the next 3 months”, etc.
  • a list of one or more specialists may be provided. The list may additionally contain the specialist’s contact information, location, distance from the patient's home, cost range, and next appointment availability. The output may also tell the patient that their scan looks normal, but they should continue to do tests at regular intervals.
  • the algorithm may be trained by labeling multiple pathologies or one at a time. There may also be multiple algorithms.
  • an object detection model could be trained and used to detect the location of zero, one, or any number of objects in an image frame (FIG. 6).
  • Objects anatomical features or landmarks
  • Objects that could be detected might include the iris, the retina, the optic nerve, the arcades, the fovea, microaneurysm, edema, scarring, pathology, eyelashes, eyelids, eyebrows, noses, a face, objects in the patient’s room, regions of the retina with particular attributes (3rd order clarity, disease), etc.
  • One model could detect all objects.
  • a different model could specialize in each object.
  • multiple models could be used, with each model specializing in a group of one or more objects.
  • one object detection model detects the iris, retina, optic nerve, fovea, and arcades, while another object detection model detects regions of 3rd order clarity (FIG. 12) and yet another detects pathology. While in another embodiment, one model detects the same features as all three or any combination of two of the three models.
  • a classification model could detect if a particular object is present, if region(s) of the ocular area with particular attributes is(are) present, or if a particular phase of the screening has been reached.
  • a classification model could be trained to detect whether or not an iris, retina, and/or optic nerve are present or detected in the image or sensor signals (FIG. 7a.).
  • a classification model could be trained to detect if regions of 3rd order clarity, or if regions of disease are present.
  • the screening process could be split into different phases such as described in FIG. 8, where each phase is defined by what eye structures are present.
  • Phase 0 could be the target phase when acceptable media is detected (the optic disc is in the correct location, and blood vessels with 3rd order clarity are present for two optic nerve diameters around the fovea).
  • Phase 1 is when the optic disc is present, but not in the correct location.
  • Phase 2 is when the retina is visible, and the optic disc or iris may still be partially visible.
  • Phase 3 is when the full iris is visible, and the retina may or may not be visible.
  • Phase 4 is when no ocular features are visible.
  • the phases include the presence of the fovea and/or disease, and more phases may be used or introduced with models trained to recognize each phase.
  • the phases are differentiated by other approaches, such as a model that can detect whether or not the media shows images inside the eye vs outside the eye.
  • a classification model would receive inputs from one or more object detection models, classification models, pre-processed sensor outputs, and/or unprocessed sensor outputs and output an instruction to a user and/or patient (see FIG.s 2, 3, 4 and 5).
  • FIG. 2 shows the training process for a model that outputs an instruction to the user/patient (e.g. up, down, left, right, look up, etc.).
  • object detection model(s) analyzes each training image that meets an intended instruction and statistics may or may not be computed about the image and these values are inputted into the model with the target given a value of 1. Additionally, training images that do not meet the intended instruction are also analyzed and inputted into the model with the target instruction given a value of 0.
  • FIG. 3 shows how dedicated models trained for each instruction needed to allow the device to achieve its predefined criteria, could each be given the same video image frame and any desired metadata (statistics) and each would output their confidence that the user should move in the direction they were trained to evaluate. Logic could then be used to evaluate the outputs of each model to determine the instruction that should be given to the user to best allow the predefined criteria to be met.
  • one or more models could be trained to give all instructions needed for images of the left eye, while one or more models could be trained to give instructions needed for all images of the right eye.
  • one or more models could be trained to give instructions needed for the user, while one or more models could be trained to give instructions to the patient.
  • the number of models or labels to which the models have been combination and purpose of each model need not be limited so long as they allow for the computation of the best instruction that will allow the predefined criteria (FIG. 13, reference number 8) to be met.
  • a tabular data model would receive inputs from one or more object detection models, classification models, pre- processed sensor outputs, and/or unprocessed sensor outputs and output one or more instructions to a user and/or patient (FIG.
  • various outputs are received into logic with pre-set thresholds that are used to output an instruction to the patient and/or user.
  • a target region could be defined for the optic disc, and if the object detection model detects the center or edge of the optic disc to the left and below the target region, logic could return the instruction that would shorten the largest distance the most.
  • the same logic could be used to provide instructions using a separate target region (e.g. retina, arcades, fovea, iris, nose, eyebrow, etc.).
  • Distance would be computed as the distance between the center of the structure and the center of its target, or the distance between the center of the structure and the closest edge of the target, or the center of the target and the closest edge of the structure, or the closest edge of the target the closest edge of the structure.
  • the object detection model could provide input into a classification model that then outputs the phase.
  • an object detection model detects appropriate structures, while a classification algorithm detects if the device is inside the eye region or outside the eye region, then the output of the away-from-eye classification model and the output from the object detection model are inputted into another classification model which determines an instruction to a user and/or patient (FIG. 5, which shows one possible embodiment of combinations of classification, object detection, and logic).
  • an object detection model could detect appropriate retina structures on a video frame (e.g.
  • the structure model retina, arcades, fovea, iris, disease, etc.
  • a second object detection model could detect regions with 3rd order clarity on the same video frame, here named the 3rd order model
  • the location of 3rd order clarity output on the retina could be determined so that by doing the same on subsequent video frames, all the location of 3rd order clarity outputs can be mapped on the retina, creating a union of regions of 3rd order clarity. If the union of regions of 3rd order clarity sufficiently covers a region of interest, an instruction of done can be given to a user and/or patient.
  • Another embodiment of the above items would include using time or iterations to provide further input.
  • camera media and/or sensor media could only be analyzed after 10 frames or 0.5 seconds have passed. Any number of media or length of seconds could be used as the input.
  • Computation may occur for each collected media (image or sensor frame), every 2nd media (image or sensor frame), 3rd media (image or sensor frame), etc., or the media that occurs at every 0.1, 0.2, 0.5, 1, 2 second(s), etc. In other words, picking an image frame in x-number of seconds intervals.
  • Further logic can be applied to ensure that a user and/or patient is not overwhelmed with instructions.
  • the instructions could be delayed by being added to a list, and this list could be analyzed every 0.1, 0.2, 1, etc. seconds or every 1, 2, 5, 10, 20, etc. media.
  • the analysis would determine the final instruction that would be returned to a user.
  • This analysis could include taking an average, a time, placement, or value-weighted average, the median, mode, or any number of statistical analyses to determine the final instruction to the patient/user.
  • a threshold could be given that would prevent an instruction from being given to a user sooner than every 0.25, 0.5, 0.7, 1, 2, etc. seconds.
  • Further logic could also be applied to ensure that before certain instructions are played to a user and/or patient other instructions are played first. For example, before communicating to a patient to “look left” or “look right”, the device could communicate to a user to “pull back and restart” and, optionally, wait before finally communicating to the patient to “look left” or “look right”. Additionally, this instruction could be provided to a user and/or patient in real time in series or simultaneously with no delay or logical separation. As described previously, this(ese) output(s) could be communicated through audio, visual, and/or haptic feedback using a variety of mediums. Additionally, any delay in an instruction could be implemented based on time or based on conditions identified from the outputs of sensors, camera(s), and/or models.
  • a combination of computer vision and/or deep learning models(s), and/or logic are used to help a user take or collect acceptable media.
  • This algorithm may use the gyroscopic, accelerometer, and other sensors and information from the device, as well as the video feed to provide prompts to a user on how to position the phone (described earlier).
  • the processing step may contain an algorithm that may be separated into two or more parts. For example, the first part may be getting to the retina and identifying the optical nerve, and the second part the smaller movements needed to get the retina in the right place.
  • a deep learning algorithm or other algorithms may be used to track the movement of the optical nerve.
  • Computer vision algorithms or other algorithms may be used to track items that can be presented to a secondary user to then relay to a user/patient.
  • computer vision tools may be used to detect features in the image which indicate a specific command.
  • these commands can be given at different time intervals and be given in different forms, including, voice prompts, visual prompts, such as a game user interface, vibrations, haptics, or other methods.
  • voice prompts such as a game user interface
  • visual prompts such as a game user interface
  • vibrations such as a game user interface
  • haptics or other methods.
  • prompts may be given based on the combined movements after a set amount of time or once a goal is reached.
  • Other structures or regions may be identified and tracked without limitation to the number, size, or type of structures or regions.
  • the algorithm may keep track of where any object located in position space using any and all of the information it is provided. It may use Simultaneous localization and mapping (SLAM) or other positioning algorithms to estimate the device’s relative position to the patient.
  • SLAM Simultaneous localization and mapping
  • the Guidance System may output all the collected ocular media (FIG. 1, reference number 7) into any number of storage formats such as individual images, video, DICOM, j son, xml, etc. or other single media or multimedia formats.
  • the outputted ocular media could be one or more media each with a different purpose.
  • the device could output a video of all the camera media collected or collated into one video file.
  • the device could output a video of all the camera media that is considered “acceptable” or that has been fdtered using predefined “acceptability” criteria (FIG. 13, reference numbers 8 and 9).
  • the device could output a single image that best meets a predefined set of “acceptability” criteria.
  • the device could output a video of all the media collected or collated into one video file with and data containing pointers to particular frames or timestamps within the video file where media that meets predefined “acceptability” criteria is located.
  • any combination of the above output embodiments could be outputted.
  • the type, amount, and criteria of outputted ocular media and/or output data described here is not exhaustive and additional media or output data may be added.
  • the Guidance System may optionally or additionally output any variety of data including data from the various sensors (distance, touch, moisture, force, magnetic, temperature, light, GPS, etc.), or computed values (pre-processed data, image or data statistics, model outputs, bounding boxes, classification labels, pointers to images that meet predefined criteria, etc.,) and save the raw data, pre-processed, or computed data to a one or more files.
  • the format of these files may be text or binary, and may be in a format that is raw or that supports multiple labeled entries such as a json, xml, csv, etc.
  • the Guidance System can continue to provide guidance to the user until predefined criteria are met (FIG. 13, reference numbers 8 and 9), at which point the user is notified (FIG. 13, reference number 10). Any number of criteria can be established.
  • the predefined criteria may be defined to include portions of the patient’s outer eye region ensuring that the eyebrows, nose, iris, etc, are in-focus either collectively or individually.
  • the predefined criteria may be based on any number of characteristics, whether used individually or together, including image clarity, image focus, image illumination, 1st, 2nd, 3rd, or 4th order clarity, etc., image noise level, structures identified, a region or combination of regions that meets one or more of the previous items, etc. (characteristics).
  • predefined criteria may be defined to include any combination of the patient’s outer eye (nose, eyelid, eyebrows, iris, etc.) with the inner eye (retina, optic nerve, retinal reflex, etc.) ensuring that they are in-focus or meet any other characteristic either collectively or individually.
  • predefined criteria may be defined to include portions of the patient’s inner eye ensuring that the optic nerve, fovea, arcades, and/or quadrants of the eye or portions thereof are infocus or meet any other characteristic either collectively or individually.
  • the predefined criteria may include the union of bounding boxes across the previous media having 3rd order clarity that covers the circular region two optic disc diameters around the fovea.
  • the criteria could be the square region three optic disc diameters in length centered around the optic disc.
  • Another embodiment could include the square region 5mm in length with the lower left comer starting 1 mm to the upper left of the optic nerve.
  • the criteria could include any number or combination of characteristics around or in relation to any number of combinations of internal or external eye structures.
  • Predefined criteria may also be defined including any combination of characteristics from non-camera data sensors (distance, touch, moisture, force, magnetic, temperature, light, GPS, etc.) in combination with the ocular media.
  • the device may notify the user (FIG. 13, reference number 10).
  • Predefined criteria may not be limited to one set of criteria, but may include any criteria or sets of criteria.
  • the method of notifying the user can be through any of the same methods or combination of methods previously described in providing human interpretable instructions to the user (visual, audio, haptic, etc ).
  • the user may indicate to the device to stop the guidance.
  • the user may not respond to the communication and continue with the guidance.
  • the device may automatically stop guidance and notify the user of such.
  • the device may automatically stop guidance and await the user’s input before making the next action.
  • achieving the predefined criteria may result in the user moving to the next eye or using the device on the same eye. In another embodiment, the user may cease using the device. In another embodiment, achieving a first set of predefined criteria may result in saving media, while achieving a 2nd set of predefined criteria may result in the user completing usage of the device on one eye and moving to another eye or finishing using the device altogether. In another embodiment, achieving a first set of criteria may result in communicating feedback to the user about their progress, while achieving a 2nd set of predefined criteria may result in saving media, while achieving a 3rd set of predefined criteria may result in communicating to the user that a milestone has been reached. Any number or combination of criteria, responses to the criteria (e.g. delivering media, changing criteria etc.), or communication methods with the user may be used or employed while using the device.
  • the device may deliver the media through a variety of means (FIG. 1, reference number 8).
  • the device may save the captured media to a persistent storage location and/or a transient storage on the device (e.g. Random Access Memory (RAM), etc.).
  • RAM Random Access Memory

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Physics & Mathematics (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Molecular Biology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Biophysics (AREA)
  • Veterinary Medicine (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Multimedia (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Signal Processing (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

L'invention divulgue un procédé de capture d'images optiques de qualité médicale d'un œil humain satisfaisant à des critères prédéfinis, provenant d'un flux vidéo à l'aide d'un dispositif de capture d'image conçu pour capturer un flux vidéo, le dispositif de capture d'image comprenant une unité de traitement et un logiciel exécuté par l'unité de traitement pour analyser le flux vidéo, le dispositif de capture d'image comprenant en outre des supports de stockage, un matériel et un logiciel aptes à fournir en sortie des instructions interprétables humaines. Le dispositif de capture d'image capture un flux vidéo d'au moins l'un des yeux du patient, un processeur sélectionne une trame d'image à partir du flux vidéo et évalue la trame d'image pour déterminer si la trame d'image satisfait à des critères prédéterminés. Si la trame d'image ne satisfait pas aux critères prédéfinis, alors l'unité de traitement détermine un repositionnement du dispositif de capture d'image nécessaire pour obtenir les critères prédéfinis, et fournit en sortie une instruction interprétable humaine.
PCT/US2025/012876 2024-01-24 2025-01-24 Système et procédé d'acquisition de milieux oculaires Pending WO2025160345A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463624578P 2024-01-24 2024-01-24
US63/624,578 2024-01-24

Publications (1)

Publication Number Publication Date
WO2025160345A1 true WO2025160345A1 (fr) 2025-07-31

Family

ID=96545880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/012876 Pending WO2025160345A1 (fr) 2024-01-24 2025-01-24 Système et procédé d'acquisition de milieux oculaires

Country Status (1)

Country Link
WO (1) WO2025160345A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210383516A1 (en) * 2020-06-08 2021-12-09 Guangzhou Computational Super-Resolution Biotech Co., Ltd. Systems and methods for image processing
US20230218159A1 (en) * 2020-06-11 2023-07-13 Eadie Technologies Inc. System, Method, and Head-Mounted Device for Visual Field Testing
US20230316522A1 (en) * 2016-10-13 2023-10-05 Translatum Medicus, Inc. Systems and methods for processing, storage and retrieval of ocular images
US20240013431A1 (en) * 2022-07-08 2024-01-11 Warby Parker Inc. Image capture devices, systems, and methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230316522A1 (en) * 2016-10-13 2023-10-05 Translatum Medicus, Inc. Systems and methods for processing, storage and retrieval of ocular images
US20210383516A1 (en) * 2020-06-08 2021-12-09 Guangzhou Computational Super-Resolution Biotech Co., Ltd. Systems and methods for image processing
US20230218159A1 (en) * 2020-06-11 2023-07-13 Eadie Technologies Inc. System, Method, and Head-Mounted Device for Visual Field Testing
US20240013431A1 (en) * 2022-07-08 2024-01-11 Warby Parker Inc. Image capture devices, systems, and methods

Similar Documents

Publication Publication Date Title
CN114980810B (zh) 用于检测人的运动过程和/或生命体征参数的系统
US9445713B2 (en) Apparatuses and methods for mobile imaging and analysis
JP5607640B2 (ja) 眼の特徴の画像を得る方法と装置
US12239378B2 (en) Systems, methods, and apparatuses for eye imaging, screening, monitoring, and diagnosis
EP3721320B1 (fr) Procédés et systèmes de communication
WO2018201633A1 (fr) Système d'identification de rétinopathie diabétique fondé sur une image de fond d'œil
US20250228453A1 (en) Method and system for measuring pupillary light reflex with a mobile phone
CN111128382A (zh) 一种人工智能多模成像分析装置
CN110944571A (zh) 用于改进眼科成像的系统和方法
EP4134981A1 (fr) Procédé d'acquisition d'image latérale pour analyse de protrusion oculaire, dispositif de capture d'image pour sa mise en ¿uvre et support d'enregistrement
KR20220054827A (ko) 동공 반응을 평가하기 위한 시스템 및 방법
US20180064335A1 (en) Retinal imager device and system with edge processing
CN104219992A (zh) 自闭症诊断辅助方法和系统以及自闭症诊断辅助装置
KR102344493B1 (ko) 인공 지능 기반의 스마트 안진 검사 시스템, 방법 및 프로그램
JP2018007792A (ja) 表情認知診断支援装置
JP2019526416A (ja) エッジ処理を有する網膜撮像装置およびシステム
US20250261854A1 (en) Method and photographing device for acquiring side image for ocular proptosis degree analysis, and recording medium therefor
CN118279299B (zh) 采用非可见光闪光灯拍摄视网膜图像的方法
WO2025160345A1 (fr) Système et procédé d'acquisition de milieux oculaires
CN116894805A (zh) 基于广角眼底图像的病变特征的识别系统
KR20220095856A (ko) 신경정신질환 및 피부질환을 진단하기 위한 장치 및 방법
WO2025247410A1 (fr) Procédé et système de détection pour déterminer si un patient souffre de strabisme et/ou d'insuffisance de convergence
KR101887296B1 (ko) 홍채 진단 시스템 및 그 시스템의 스트레스 진단 방법
KR20250151512A (ko) 동공 움직임의 검출 및 평가
CN116048255A (zh) 适用于微意识障碍患者基于人脸追踪的眼动人机交互系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25745764

Country of ref document: EP

Kind code of ref document: A1