WO2025231080A1 - Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices - Google Patents
Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devicesInfo
- Publication number
- WO2025231080A1 WO2025231080A1 PCT/US2025/026999 US2025026999W WO2025231080A1 WO 2025231080 A1 WO2025231080 A1 WO 2025231080A1 US 2025026999 W US2025026999 W US 2025026999W WO 2025231080 A1 WO2025231080 A1 WO 2025231080A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- images
- moving object
- coordinates
- interest
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Definitions
- the present disclosure relates to systems, devices, and computerized methods for tracking and displaying moving objects on, for example, mobile devices.
- the tracking of moving objects can be applied towards a broad range of applications, for example, when evaluating an objects response to conditioning based on body and limb positions during a series of movements.
- a user When performing such evaluations, a user typically relies on visual inspection and/or manual tracking of body and limb positions in an image, and then comparing the image to other image(s) to identify differences therebetween.
- the accuracy, reliability, and functionality of these techniques can be limited due to necessitating a certain amount of human intervention, which may be further limited due to external factors such as, for example, the variability of objects, visual obstructions in images, and limits on viewing angles in the captured images, which can affect the overall performance of performing the motion detection and analysis.
- external factors such as, for example, the variability of objects, visual obstructions in images, and limits on viewing angles in the captured images, which can affect the overall performance of performing the motion detection and analysis.
- the need for user input or feedback to analyze movement patterns and to make determinations on the significance of variances can render the process highly subjective and the reliability of any determinations made based on such analysis can demonstrate inconsistencies due to internal factors such as, for example, the user’s age, bias, experience, and other such factors.
- a system for tracking body kinetics includes a processor; and a non-transitory computer readable medium having stored therein instructions that are executable by the processor to perform operations including obtain a first dataset corresponding to a first set of images, the first set of images including a moving object; determine a visual environment based on the first set of images; analyze the first set of images to identify and track the moving object in the visual environment; identify points of interest in the first set of images representative of the moving object; and generate a first set of coordinates as output based on a position of the points of interest in the visual environment.
- the operations performed by the processor further including obtain a second dataset corresponding to a second set of images, the second set of images including the moving object; analyze the second set of images to identify and track the moving object in the visual environment; identify points of interest in the second set of images representative of the moving object; and generate a second set of coordinates as output based on the position of the points of interest in the visual environment.
- the operations performed by the processor further including determine a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation.
- the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
- the operations performed by the processor further including obtain, from a second computing device, a third set of coordinates.
- the translation is further determined based on the third set of coordinates.
- the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the object, the movement of the prosthetic device being captured in the second set of images.
- the system further includes a user input device in communicable connection with the processor.
- the user input device includes a touch-screen interface integral with a display.
- the processor is configured to identify the moving object in image data based on regions selected using the user input device.
- the processor further performs operations including associate a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determine a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associate a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determine a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
- the processor further performs operations including obtain a training dataset including image data corresponding to movement of one or more objects; and train a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images.
- the processor identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
- the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the object so that the certain movement is similar to the moving object in the first set of images.
- a computer-implemented method includes obtaining, by a first computing device, a first dataset corresponding to a first set of images, the first set of images including a moving object; determining a visual environment based on the first set of images; identifying and tracking the moving object in the visual environment based on the first set of images; identifying points of interest in the first set of images representative of the moving object; and generating a first set of coordinates as output based on a position of the points of interest in the first set of images.
- the method further including obtaining, by the first computing device, a second dataset corresponding to a second set of images, the second set of images including the moving object; identifying and tracking the moving object in the visual environment; identifying points of interest in the second set of images representative of the moving object; generating a second set of coordinates as output based on the position of the points of interest in the second set of images; and determining a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation.
- the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
- the method further including receiving an input at a user input device to enable identifying the points of interest in the first set of images and the second set of images.
- the points of interest are identified in the first set of images and the second set of images based on the input.
- the user input device is in communicable connection with a processor of the first computing device.
- the user input device includes a touch-screen interface integral with a display of the first computing device.
- the method further including obtaining, from a second computing device, a third set of coordinates.
- the translation is further determined based on the third set of coordinates.
- the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the moving object, the movement of the prosthetic device being captured in the second set of images.
- the method further including associating a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determining a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associating a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determining a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
- the method further including obtaining a training dataset including image data corresponding to movement of one or more objects; and training a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images.
- the first computing device identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
- the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the moving object so that a certain movement performed by prosthetic device of the moving object is similar to the moving object in the first set of images.
- FIG. 1 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
- FIG. 2 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
- FIG. 3 is a flow diagram of a method for training models for performing image processing techniques, according to some embodiments.
- FIG. 4 is a flow diagram of a method for performing the image processing techniques, according to some embodiments.
- FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments.
- FIG. 6 is a flow diagram of a method for performing the image processing techniques, according to some embodiments.
- FIG. 7 is a flow diagram of a method for performing the image processing techniques, according to some embodiments.
- Various embodiments of the present disclosure are directed to tracking of moving objects in images in near real-time using computer-based models including, for example, image processing models.
- the embodiments described herein may be provided in computing devices such as, for example, in mobile computing devices.
- the computing device may include a processor, a non-transitory computer readable media device such as, for example, a memory or hard-disk drive, and one or more sensor devices for capturing images such as, for example, a camera.
- the computing devices can include instructions stored in the memory such as, for example, software applications executable by the processor to perform operations in accordance with one or more embodiments described herein.
- the models utilized in these systems and computing devices may include, for example, machine vision models, machine learning models, Al models, and the like, that can replace vision/recognition tasks traditionally performed by humans or tasks that traditionally needed a certain amount of human input or feedback to process the image processing techniques in real-time.
- the image processing techniques that may be applied by these models can include, according to some embodiments, processing images or a series of images to identify and track an object or objects in the images.
- the image processing techniques applied by the models can include processing the images to identify and track the movement of objects in a visual environment generated based on the images, to identify and track the movement of portions of these objects (e.g., arms, legs, wings, etc.), and to make predictions of a condition of the objects based on the analysis.
- the moving objects captured in these images may be in fields such as, for example, sports, physical therapy, physical conditioning, medical diagnostics, cell biology, astronomy, ornithology, equestrian sports, and the like.
- Such objects may include, for example, humans, horses, birds, reptiles, and other like objects.
- the objects captured in the images may be at a cellular level.
- the objects may include astronomical objects, vehicles, automobiles, motorcycles, bicycles, airplanes, drones, and other like objects.
- Systems including these models can apply the image processing techniques to track object(s) in the captured images or video such as, for example, movement of a person including one or more of their limbs including, but not limited to, head, neck, torso, arms, hands, fingers, legs, feet, portions thereof, or any combinations thereof.
- the images may be captured by an external device such as, for example, a handheld DSLR camera, and a computing device may obtain the images from the external device to be processed using the models described herein.
- the image processing techniques may be used to compare images of an object’s movement either between images or between sets of images, such as between images captured during different points in time to make predictions associated with the object based on the images.
- the system may process images capturing a person currently undergoing physical therapy performing a sequence of movements to make certain predictions of the type of injury or to provide recommendations for therapeutic exercises that can be performed to help correct for any identified issues.
- Various embodiments of the present disclosure relate to systems, devices, computer-implemented methods, and non-transitory computer readable media for performing the image processing techniques for tracking moving objects in images including tracking body kinetics.
- the techniques can include determining data corresponding to differences in the object’s movement based on a comparison of the images to reference data, and visually displaying the differences between the object’s movement in the images and the reference data.
- a mobile computing device e.g., mobile cellular phone having a camera and a display may be used to capture images of a set of movements by an object and the mobile computing device may process the images to identify differences between the object’s movement in the images compared to reference data such as, for example, historical image data, and may visually display the differences on the display of the mobile computing device in near real-time.
- the image processing techniques for tracking of moving objects in images may include calculating proportional differences, rotational differences, acceleration differences, speed differences, other like data, or any combinations thereof, between the moving object in the images and the reference data.
- the image processing techniques may be used to track an athlete’s swing sequence in images captured during different time periods and view differences in the athlete’s swing sequence by displaying the images captured during one time period relative images captured during another time period.
- the image processing techniques can include determining the object is demonstrating movements associated with certain physical issues based on comparing the object’s tracked movements from the captured images to reference data and determining data corresponding to differences between the object’s movement in the captured images and the reference data.
- the reference data may include data of similar movements by the same athlete.
- the reference data may include historical image data of the same object that is captured in the images being processed.
- the reference data may include historical image data of objects that are similar in type to the object captured in the images being processed. Based on the difference data generated based on the object’s tracked movements and based on data corresponding to similar movements of objects from image data in the reference dataset, a prediction associated with the object can be made.
- the system may provide a prediction, based on the athlete’s tracked movements in the images and based on the processing of those images, that the athlete displays indications of decreased mobility in their leg due to, for example, a pulled hamstring muscle, or some other type of physical issue that affects the athlete’s movement.
- the image processing techniques for tracking of moving objects in images may include generating and providing display data to a display device, the display data corresponding to visual representations highlighting variances in the object’s movement compared to reference data such as, for example, in historical image data determined based on comparing differences between features extracted from the captured images to the reference dataset.
- the reference data may correspond to pre-captured image data of different object types including, for example, data of the certain moving object being analyzed in the images.
- the system may, based on the data corresponding to differences determined between the image data and the reference data, generate and output display data corresponding to the visual representations of real-time highlighting in the captured image or images of the moving object, or may display such data on a display in electronic communication with the system.
- the highlighting may, according to some embodiments, be indicative of one or more differences including, but not limited to, joint limit, joint speed, joint acceleration, joint distance, object speed, object acceleration, object distance, other indications, or any combinations thereof.
- the image processing techniques may be applied to images of a moving object to track the movement of the object, or a portion thereof, based on the images.
- Tracking the object may include determining one or more points of interest associated with the object based on the movement of the object in the images. Tracking the object may also include determining a location of the points of interest based on the images.
- tracking the object includes determining a virtual environment. In addition, tracking the objects can include determining a location of the points of interest in the virtual environment.
- the location of the points of interest may include, for example, coordinates along one or more axis including, for example, a first axis, a second axis, a third axis, other axis, or any combinations thereof.
- the points of interest can include appendages, limbs, torso, joints, other portions of the object, other objects, or any combinations thereof.
- the points of interest can include tracklets.
- the image processing techniques may be applied to a first set of images and a second set of images. The image processing techniques may be applied to the first set of images to determine a first set of coordinates for points of interest in the first set of images.
- the first set of images may capture a moving object during a first time period.
- the image processing techniques may also be applied to the second set of images to determine a second set of coordinates for points of interest in the second set of images.
- the second set of images may capture a moving object during a second time period, the second time period occurring after the first time period.
- the image processing techniques may include determining differences between the first set of coordinates and the second set of coordinates and determining a refinement dataset, the refinement dataset corresponding to a translation of the second set of coordinates to the first set of coordinates in the virtual environment.
- the refinement dataset may then be utilized to refine a position of the object in a real-life environment of the object from the second set of coordinates to the first set of coordinates. That is, the object movement as captured in the second set of images may be refined to be similar or substantially similar to the object movement in the first set of images.
- the first set of images may be of a person’s left arm movement performing a first action during a first period of time
- the second set of images may be of the person’s left arm prosthetic movement performing a second action similar during a second period of time
- the image processing techniques may include refinement data corresponding to coordinate data based on a coordinate system scheme associated with the prosthetic to refine the movement of the person’s prosthetic when performing the second action.
- an operation of the prosthetic device may be controlled by a computing device including a processor and a memory having stored therein instructions executable by the processor to enable the prosthetic device to perform movement operations, and the refinement data may be utilized by the computing device to refine the movement of the prosthetic device in the real-life environment of the user.
- the second set of coordinates may be determined based on the second set of images. In other embodiments, the second set of coordinates may be determined based on the second set of images and based on corresponding coordinate data of the second set of movements from the computing device associated with the prosthetic.
- the translating of the second set of coordinates to the first set of coordinates may be further refined based on the coordinate data from the computing device associated with the prosthetic device.
- the refinement data may be based on a coordinate system of the computing device, a coordinate system of the computing device associated with the prosthetic, or another coordinate system.
- FIG. 1 is a schematic diagram illustrating a non-limiting example of a system 100, according to some embodiments.
- System 100 may be a computing device such as, for example, a personal computing device associated with a user.
- system 100 may be a mobile computing device such as, for example, a smart cellular telephone, tablet, laptop, personal digital assistant (PDA), augmented reality (AR) device such as a headset, or other like devices.
- PDA personal digital assistant
- AR augmented reality
- System 100 may include side 102 and side 104 opposite the side 102. In FIG. 1 , both the sides 102, 104 of system 100 are shown for simplicity purposes.
- System 100 may include one or more components therein including processor 106, memory 108, image sensor 110, and display 112. The one or more components of system 100 may be located in housing 114.
- the image sensor 110 may be located on a side of housing 114. In FIG. 1 , for example, image sensor 110 is shown located on side 104.
- system 100 may include the image sensor on side 102, image sensor on side 104, or image sensors on both side 102 and side 104. In other embodiments, such as shown in FIG.
- the system 100 may be in electronic communication with an external image sensor.
- the display 112 may also be located on a side of housing 114.
- FIG. 1 shows display 112 located on side 102 of housing 114.
- system 100 may include display 112 on side 102, side 104, or on both the sides 102, 104.
- the processor 106 may be a microprocessor, such as that included in a device such as a smart phone.
- the processor is configured to analyze images from the image sensor 110 to identify moving objects in images including slow-moving objects and fast-moving objects, tracks the identified objects movement between images and determines differences in the objects movement compared to reference data, and then displays image data corresponding to visual representations of the differences.
- the display may show images from the image sensor with alert frames overlaid on an identified portion or portions of the moving object (e.g., limbs) indicative of differences in the object’s movement compared to reference data.
- the processor may be additionally configured to crop images or to mark certain areas of images and exclude those areas from analysis.
- the processor may receive input from a user input device, for example touch-screen input from the display 112 and based on the input, determines areas to crop out or exclude from analysis.
- the types of alerts including audio components and the shape, color, or effects such as flashing of alert boxes on the display 112 may be selected.
- Memory 108 is a non-transitory computer readable data storage device.
- the memory 108 can be, for example, a flash memory, magnetic media, optical media, random access memory, etc.
- the memory 108 is configured to store data corresponding to images captured by the image sensor 110.
- the images captured by the image sensor 110 may include additional data such as timestamps, or may be modified by the processor, for example cropping the image files or marking zones of the image files as excluded from analysis by the processor 106.
- the image sensor 110 may be, for example, a digital camera.
- the image sensor 110 may capture a series of images over a time period.
- the image sensor 110 may capture video corresponding to a series of images over the time period.
- the image sensor 110 may also add additional data to the captured images, such as metadata, for example timestamps.
- the image sensor 110 may be an infrared camera or a thermal imaging sensor.
- the image sensor 110 is a digital sensor.
- the image sensor 110 includes an image stabilization feature.
- the frame rate and/or the resolution of the image sensor 110 affects a sensory range of the system. A greater frame rate may increase the sensory range of the system. A greater resolution may increase the sensory range of the system.
- Display 112 may be a display device or component which includes light emitting diodes (LED), organic light emitting diodes (OLED), liquid crystal display (LCD), and other like types of display devices.
- the display 112 can be a component of a smart phone or a tablet device.
- Display 112 receives processed image data from the processor 106 and displays the processed image data.
- the display 112 may include a user input feature, for example where the display is a touchscreen device.
- the input feature may be used, for example, to define regions of the images to exclude from analysis, or to select options and set parameters for those options such as distance and velocity thresholds for alarms or time windows for performing tracking operations.
- FIG. 2 is a schematic diagram illustrating a non-limiting example of a system 200, according to some embodiments.
- System 200 includes processor 202, memory 204, and multiple image sensors 206, which may be in electronic communicable connection with each other.
- the processor 202 and the memory 204 may be located separately from the multiple image sensors 206.
- System 200 may include a display device for displaying image data. In some embodiments, the display may be located separately from processor 202 and memory 204.
- System 200 may be in electronic communicable connection with one or more other computing devices such as, for example, system 100 in FIG. 1 , and the other computing device may display image data on its display.
- the processor 202, memory 204, and multiple image sensors 206 may be in electronic communicable connection through one or more different types of electronic connections.
- one or all of the processor 202, memory 204, and multiple image sensors 206 may also be in electronic communicable connection with other computing devices such as system 100 through one or more different types of electronic connections.
- the components of system 200 may be in electronic communicable connection through wired connections and/or wireless connections including, but not limited to, Wi-Fi, Bluetooth, Bluetooth Low Energy (BLE), 5G, 4G, LTE, CDMA, other wireless communication protocols, or any combinations thereof.
- the wired connections can include USB, ethernet, and the like.
- the multiple image sensors 206 may include a fixed camera.
- the camera can be mounted on a vehicle (e.g., land vehicle, water vehicle, or air vehicle).
- the camera can be mounted to a wired suspension assembly configured to move the camera along one or more axis to capture images of persons moving on a sporting field.
- the camera can be mounted on an aerial drone device.
- the camera may be portable and located in a housing that may be fixed to another object, such as a static object, for example a tree, or a movable object, for example a helmet.
- the camera may be located in a housing that may be hand-held.
- the multiple image sensors 16 may be part of a single three- dimensional (3-D) camera such as a stereoscopic camera.
- 3-D three-dimensional
- the distance to an object or its approximate size may be determined based on the images from each of the image sensors.
- there may be an additional user input tool for example a keyboard and/or mouse in communication with the processor 202, or integrated into components of the system 200, such as a display device similar to display 112 having touch-screen functionality.
- the display may be a two-dimensional display screen such as an LED, LCD, or OLED display.
- the display may be a VR device such as a headset, an AR device such as a head-mounted display with a translucent or transparent screen.
- FIG. 3 is a flow diagram of a method 300 for training models for performing image processing techniques, according to some embodiments.
- the method 300 includes establishing a visual environment in images.
- Establishing the visual environment may include identifying static or non-moving areas that are not tracked and identifying moving objects in the images for tracking.
- Establishing the visual environment may include classifying objects in the images.
- the visual environment may be the vertical and horizontal area captured by an image sensor such as, for example, image sensor 110 in FIG. 1 . That is, the visual environment may represent the visual range of the image sensor.
- the visual environment may correspond to the characteristics of images provided to a user via a display such as, for example, display 112 in FIG. 1.
- the images provided to the user via the display may include, for example, the field of view and the resolution of the displayed image.
- Identified static or non-moving areas may be determined by a lack of change in a portion of the visual environment (e.g., pixel or pixels) across multiple frames of the visual environment captured by the image sensor.
- the aspects of the static or non-moving areas that do not change may include the color, hue, shade or brightness of those areas.
- Static or non-moving areas may include pixels representative of the ground and fixed features such as, for example, trees, structural members, and the like, which are not moving relative to the image sensor.
- the image sensors may be at an elevated fixed position and may capture images of the moving objects and the ground during an application such as during a horse race. Static or non-moving areas do not trigger any alerts or tracking as will be further described herein.
- the method 300 includes identifying objects in the visual environment of the images. Identifying the objects may include identifying regions in the image or images corresponding to the background and one or more moving objects in the image.
- the image may include one or more objects moving in the image or images.
- the objects may include, for example, first object, second object, and through nth objects captured in an image or series of images during a certain time period.
- one or more objects may be captured in images by an image sensor and displayed on a display device such that a target object for tracking may be identified based on one or more inputs received at an input device.
- the inputs may correspond to a user selection of the target object.
- the inputs may correspond to a region of the image including therein at least a portion of the image and the one or more techniques herein can analyze the selected region and identify the target object.
- Identifying the objects in the images may include identifying regions in the image or series of images corresponding to pixels representative of objects, or pixels representative of a target object among one or more objects. For example, a boundary may be identified in the images defining the object, the boundary including the object’s torso and any appendages. Identifying the objects may include classifying objects. For example, the images may capture more than one moving object in the image. In this regard, one or more objects and/or one or more different types of objects including a target object may be identified in the images for tracking. For example, the tracking may be performed on a particular object, e.g., athlete, in a group of other similar objects, e.g., other athletes, in the images.
- user input may be used to define regions which are excluded from the analysis. Excluded regions may include, for example, areas unlikely to contain an object of interest, for example the ground, areas with an excess of distractors such as a crowd in the background, or areas which are not useful to the user such as outside a playing surface.
- the user input for defining the region may be interaction with a display or user interface showing the view of the image sensor, with the input being a touch-screen input such as drawing a region to exclude or using two or more fingers to define the region to be excluded, or selection using a cursor such as one controlled by a mouse, track-pad, joy-stick or other such control interfacing with the processor and the display.
- the excluded regions may be removed from the portions of the image which are analyzed, may be treated as non-moving or static areas (e.g., no movement areas), or may be treated as distractors.
- the excluded regions may be cropped out of the image sensor images prior to their processing in establishing the visual environment.
- the method 300 includes identifying joints for the target object and associated the joints with the moving object in the visual environment. Identifying the joints may include identifying, based on the region of pixels identified as representing the target moving object, tracklets corresponding to appendages and joints of the object identified based on rendering a frame of the object. For example, joints corresponding to elbows, knees, ankles, and wrists associated with a person being tracked in the images.
- a target object may include, for example, a torso, a first appendage having a first joint associated therewith, a second appendage having a first and second joint associated therewith, and through a nth appendage having a nth joint associated therewith.
- Identifying the set of joints may include identifying, for each image, a set of tracklets based on a frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object or a segment extending between respective joints at the ends of the tracklet.
- identifying the joints may include determining a 2-D frame of the moving object based on the 2-D images, and rendering a 3-D frame of the object based on applying one or more algorithms to the 2-D frame to map the object in a 3-D coordinate system, the rendering process for generating the 3-D model being capable of compensating for occlusion due to using 2-D images as input.
- the method 300 includes determining a pattern of motion of the target object based on a movement of the joints in the visual environment and generating an output dataset corresponding to the pattern of motion of the object.
- the pattern of motion may include a joint range of the moving object.
- Determining the joint range may include tracking a position of the objects joints and determining a pattern of motion of the target object based on determining rotational limits, acceleration, and speed of the joints. This pattern data including the relative joint positions, rotational limits, acceleration, and speed of the joints may be stored in the memory. Determining the joint range may also include, based on the pattern of motion of the joints, determining an overall speed relative the real-life environment, acceleration data, and direction of the moving object. In some embodiments, determining the pattern of motion for the object ay include calculating, between each image of the set of images, movement of the respective set of tracklets between each image frame to determine the pattern of motion for the object.
- Determining the joint range may include identifying areas or regions associated with the object and its appendages in an environment such as, for example, the visual environment, multi-dimensional coordinate system, or some other environment, where a rate of change in one or more aspects is greater than zero or a minimum value. In some embodiments, the rate of change may be determined based on exceeding a threshold value.
- the aspects which may change in the pixels of the object may include the color, hue, shade, and/or brightness of that area or object. In an embodiment, a determination is made that an object is moving based on whether one or more threshold values for the rate of change of the aspects of that area or object are satisfied.
- the size of the area of the visual environment where the aspects are changing may also be used to determine whether an object is moving and the rate at which the object or its appendages and joints are moving.
- the threshold values for rate of change and/or the area over which aspects change may, in an embodiment, be one or more predetermined values.
- the predetermined values may be selected, for example, based on the use case in which this method is applied, for example using one set of threshold values for an embodiment based on tracking a racehorse performing a sequence of movements, while an embodiment for use in monitoring flight patterns of birds has a different set of predetermined threshold values.
- the threshold value is selected (set by) a user via a user interface (e.g., a graphical user interface displayed on a display).
- These moving areas may include large areas including therein one or more objects, such as in some examples, birds flying in a flock or stars moving across the sky.
- These objects may, in some embodiments, be user defined, for example through selection of regions via a user interface such as a touch screen. For objects, there may be an initial alert such as a presentation of a box around the object.
- the moving objects are tracked continuously. As each image is processed, the object corresponding to the target object is identified and tracked as the target object moves in the image or in each frame of a series of images. When monitored, objects or areas corresponding to the target object may be identified, and in some embodiments, a tracking symbol such as an alert box may be displayed over the target object.
- the method 300 includes recording training data corresponding to movement data generated based on tracking the target object in the images.
- the movement data may include, but is not limited to, joint position data, pattern data, rotational limits, accelerations, joint speed, joint direction, overall speed, overall accelerations, overall direction, metadata, other movement characteristics, or any combinations thereof.
- the data may include other types of data including object type classifications, species, gender, classifications of inanimate or non-moving objects, other definitions, or any combinations thereof.
- the training data may be utilized by the one or more techniques to enable performing the object tracking using one or more models as described herein.
- One or more data points generated based on the tracking may also be combined with the training data to iteratively update the reference data to provide improved object tracking functionality by the one or more techniques and one or more models.
- the models may be trained using the training data and updating the training data iteratively updates the models to provide improved functionality such as improved object classification, tracking, joint identification, pattern determination, and determination of differences between new images and the training data, as will be further described herein.
- any of the training data is used to train a machine-learning device, system, or both to produce an improved device, system, or both.
- any of the training data is used to train an artificial intelligence (Al) device, system, or both to produce an improved device, system, or both.
- the method 300 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques at a display device.
- the display may show the images from the image sensor and including an outline of the target object.
- the display may show the images and a frame associated with the target object, and data values corresponding to joint positions, speed, rotation, direction, and the like.
- FIG. 4 is a flow diagram of a method 400 for performing the image processing techniques, according to some embodiments.
- the method 400 includes obtaining an image or obtaining a series of images from an image sensor.
- the image sensor may correspond to, for example, image sensor 110 in FIG. 1 .
- the image sensor may correspond to at least one of the multiple image sensors 206 in FIG. 2.
- obtaining the image may include receiving a first dataset comprising image data corresponding to a plurality of images from at least one image sensor.
- the images may be captured during a period of time occurring after the period of time the images were processed at block 302. That is, the images and the processing of the images to generate the training data, as shown in FIG. 3, may correspond to historical image data captured and/or processed during a period of time occurring before the time period the images at block 402 were captured.
- the method 400 includes analyzing the image data to establishing a visual environment based on the images.
- the method 400 for establishing the visual environment may be similar to operations performed at block 302 in FIG. 3, according to some embodiments.
- a model such as, for example, a computer vision model trained on the training data may be utilized to analyze the images and determine the visual environment based on the images.
- the method 400 includes identifying objects in the visual environment of the images, the objects including a target moving object.
- the target object may be identified from a plurality of different objects or different types of objects.
- the method 400 for identifying the objects in the visual environment may be similar to the operations performed at block 304 in FIG. 3, according to some embodiments.
- the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identifying the objects in the visual environment.
- identifying the objects includes classifying the moving object or objects based on a comparison of the target objects to the training dataset.
- the method 400 may include obtaining the training dataset such as, for example, from block 310 in FIG. 3, to perform the object classification.
- the method 400 includes identifying joints of the target object in the visual environment of the images.
- the method 400 for identifying joints of the target object in the visual environment may be similar to operations performed at block 306 in FIG. 3, according to some embodiments.
- the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joints of the target object as the object moves in the visual environment of the images.
- the method 400 may include identifying a set of skeletal joints corresponding to a body of the target object and corresponding to, for example, a torso and limbs of the target moving object.
- the method 400 includes identifying a joint range of the target object in the visual environment of the images.
- the method 400 for identifying the joint range of the target object in the visual environment may be similar to operations performed at block 308 in FIG. 3, according to some embodiments.
- the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joint range of the target object as the object moves in the visual environment of the images.
- identifying the joint range may include tracking one or more factors including, but not limited to, a position, rotation, acceleration, and direction of the set of skeletal joints.
- the method 400 includes comparing the data generated based on processing the images at blocks 402, 404, 406, 408, and 410 to the training data.
- the training data may be obtained prior to, or during, block 412.
- the training data may correspond to, for example, training data generated at block 310 in FIG. 3.
- comparing the dataset to the training dataset further includes obtaining a set of second tracklets from the training dataset, comparing the set of tracklets determined at block 408, 410 with the set of second tracklets to calculate the differences therebetween.
- the method 400 includes identifying differences between the data generated from tracking the target moving object at blocks 402, 404, 406, 408, and 410, relative the training data. These differences may correspond to differences in a relative position of the object in the visual environment, differences in position of the object’s torso and limbs, proportional differences, rotational differences, acceleration differences, speed differences, other differences, or any combinations thereof. Identifying these differences, or deltas, may include performing one or more calculations to identify these differences.
- the training data may include therein corresponding data associated with the target object, one or more different objects similar in type to the target object, one or more different types of objects than the target object, or any combinations thereof.
- determining that one or more of the differences between the data corresponding to the pattern of motion relative the training data exceeding a predetermined threshold value or values is indicative of an anomalous movement sequence by the object.
- identifying that the differences between the data generated from tracking the target moving object relative the training data exceeds the predetermined threshold value or values may be indicative of physical issues with the object.
- the method 400 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques of method 400 at a display device.
- the display may show images from the image sensor including an outline of the target object and a visual representation of the differences between the object in the images as compared to the training data (e.g., differences in limb position, rotation, speed, directions, etc.
- the display may show images obtained at block 402 overlaid with images from the training data and including data values corresponding to joint positions, speed, rotation, direction, and the like.
- the display may show a graphical user interface configured to receive inputs from a user to enable performing the image processing techniques in accordance with the present disclosure.
- displaying image data onto a display device may including displaying a visual indication of differences between the position and direction of the set of skeletal joints of the moving object based on a comparison to the reference dataset.
- FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments.
- images 502 captured by an image sensor or sensors such as, for example, image sensor 110 in FIG. 1 , may be displayed on display device 504.
- the x-axis 506 is horizontal with respect to the orientation of the image sensor
- the y-axis 508 is vertical with respect to the orientation and position of the image sensor
- the z-axis 510 is the direction from the image sensor to the tracked object.
- the axes of movement of detected objects are relative to the image captured by the image sensor or sensors.
- Each frame of the images may show the target object in a respective position such that the series of images may be translated to movement of the target object or movement of portions of the target object.
- the object’s movement corresponds to movement of the object’s appendages (limbs) and joints relative the x-axis 506, y- axis 508, and z-axis 510.
- the display device 504 shows a visual environment 512 determined based on the images captured by the image sensor or sensors, and a target object 514 identified in the images for tracking.
- the display device 504 may also show, according to some embodiments, a mapping of a frame 516 and corresponding joints 518 associated with the object 514 in the visual environment 512.
- the joint range and pattern of movement of object 514 may be determined by calculating the translation of the joints 518 along a multi-axial coordinate system such as, for example, x-axis 506, y-axis 508, and z-axis 510 to determine factors such as, for example, position, proportion, rotation, speed, acceleration, direction, other like factors, or any combinations thereof.
- This data can then be used as training data to train models to perform the image processing techniques in accordance with the present disclosure, or compared to training data to determine differences in movement of the object 514 as compared to object(s) in the training data.
- FIG. 6 is a flow diagram of a method 600 for performing the image processing techniques, according to some embodiments.
- the method 600 includes obtaining an image or a set of images from an image sensor.
- the image sensor may correspond to, for example, image sensor 110 in FIG. 1 .
- the image sensor may correspond to at least one of the multiple image sensors 206 in FIG. 2.
- the images may be obtained from another computing device.
- obtaining the image may include receiving a dataset comprising image data corresponding to a plurality of images from the at least one image sensor.
- the method 600 includes analyzing the image data to establishing a visual environment based on the images.
- the method 600 for establishing the visual environment may be similar to operations performed at block 302 in FIG. 3, according to some embodiments.
- a model such as, for example, a computer vision model trained on the training data may be utilized to analyze the images and determine the visual environment based on the images.
- the method 600 includes identifying objects in the visual environment of the images, the objects including a target moving object. In some embodiments, the target object may be identified from a plurality of different objects or different types of objects.
- the method 600 for identifying the objects in the visual environment may be similar to the operations performed at block 304 in FIG. 3, according to some embodiments.
- the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identifying the objects in the visual environment.
- identifying the objects includes classifying the moving object or objects based on a comparison of the target objects to the training dataset.
- the method 600 may include obtaining the training dataset such as, for example, from block 310 in FIG. 3, to perform the object classification.
- identifying the object includes identifying one or more points of interest (or regions of interest) associated with the object. These points of interest may include, but is not limited to, a head, neck, torso, arms, legs, hands, fingers, feet, other parts, or combinations thereof.
- the regions of interest may include a torso, a left arm, and a right arm.
- the method 400 includes identifying joints of the target object in the visual environment of the images.
- the method 600 for identifying joints of the target object in the visual environment may be similar to operations performed at block 306 in FIG. 3, according to some embodiments.
- the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joints of the target object as the object moves in the visual environment of the images.
- the method 600 may include identifying a set of skeletal joints corresponding to a body of the target object and corresponding to, for example, a torso and limbs of the target moving object.
- the method 600 includes identifying a joint range of the target object in the visual environment of the images.
- the method 600 for identifying the joint range of the target object in the visual environment may be similar to operations performed at block 308 in FIG. 3, according to some embodiments.
- the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joint range of the target object as the object moves in the visual environment of the images.
- identifying the joint range may include tracking one or more factors including, but not limited to, a position, rotation, acceleration, and direction of the set of skeletal joints.
- the method 600 includes determining coordinates tracking the objects movement based on the images. That is, coordinates for the one or more points of interest can be tracked as the object moves in the set of images and based on processing of the images at blocks 602, 604, 606, 608, and 610.
- the points of interest may also be determined (e.g., identified and classified) based on the training data.
- the coordinates may correspond to the position of the points of interest in the visual environment at each image.
- the visual environment may be a virtual environment based on a coordinate system and the movement of the object in the images can be tracked using the coordinate scheme of the virtual environment.
- the coordinate scheme may be a coordinate scheme of a computing device performing the operations at blocks 602, 604, 606, 608, 610, and 612. In other embodiments, the coordinate scheme may be a coordinate scheme of another computing device.
- determining the coordinates further includes obtaining a set of second tracklets from the training dataset, comparing the set of tracklets determined at block 608, 610 with the set of second tracklets to calculate the differences therebetween.
- the method 600 includes identifying differences between the data generated from tracking the target moving object at blocks 602, 604, 606, 608, and 610, relative the training data. These differences may correspond to differences in a relative position of the object in the visual environment, differences in position of the object’s torso and limbs, proportional differences, rotational differences, acceleration differences, speed differences, other differences, or any combinations thereof. Identifying these differences, or deltas, may include performing one or more calculations to identify these differences.
- the training data may include therein corresponding data associated with the target object, one or more different objects similar in type to the target object, one or more different types of objects than the target object, or any combinations thereof.
- determining that one or more of the differences between the data corresponding to the pattern of motion relative the training data exceeding a predetermined threshold value or values is indicative of an anomalous movement sequence by the object.
- identifying that the differences between the data generated from tracking the target moving object relative the training data exceeds the predetermined threshold value or values may be indicative of physical issues with the object.
- the method 600 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques of method 600 at a display device.
- the display may show images from the image sensor including an outline of the target object and a visual representation of the differences between the object in the images as compared to the training data (e.g., differences in limb position, rotation, speed, directions, etc.
- the display may show images obtained at block 602 overlaid with images from the training data and including data values corresponding to joint positions, speed, rotation, direction, and the like.
- the display may show a graphical user interface configured to receive inputs from a user to enable performing the image processing techniques in accordance with the present disclosure.
- displaying image data onto a display device may including displaying a visual indication of differences between the position and direction of the set of skeletal joints of the moving object based on a comparison to the reference dataset.
- FIG. 7 is a flow diagram of a method 700 according to some embodiments.
- the method 700 may be an embodiment of blocks 602, 604, 606, 608, 610, and 612 of FIG. 6.
- the method 700 includes obtaining a first image data.
- the first image data may include a first set of images.
- the first image data may include a first set of coordinate data.
- the first image data may include other data such as, for example, data generated while performing operations 602, 604, 606, 608, 610, 612, or any combinations thereof, from FIG. 6 on the first set of images.
- the first image data may correspond to data based on images captured during a first time period.
- the first image data may include images of a person performing certain movements with the right arm and a first set of coordinate data corresponding to the set of movements.
- the method 700 includes obtaining a second image data.
- the second image data may include a second set of images.
- the second image data may include a second set of coordinate data.
- the second image data may include other data such as, for example, data generated while performing operations 602, 604, 606, 608, 610, 612, or any combinations thereof, from FIG. 6 on the second set of images.
- the second image data may include images captured during a second time period, the second time period occurring after the first time period.
- the second image data may include images of the person performing certain movements using a prosthetic device that has replaced their right arm, or a portion thereof.
- the method 700 includes determining a translation based on the first image data and the second image data.
- the first image data includes a first set of coordinates and the second image data includes a second set of coordinates
- determining the translation includes determining a dataset to enable translating the second set of coordinates to the first set of coordinates in the virtual environment.
- the translation may be based on a coordinate system of the virtual environment. In other embodiments, the translation may be based on a coordinate system of another computing device.
- a prosthetic device may include a computing device such as, for example, computing device as shown at block 710, the computing device including a processor and a memory and including instructions for controlling an operation of the prosthetic device including coordinate data for certain movements, and the translation may be based on a coordinate system of the prosthetic device.
- the method 700 includes obtaining from computing device 710 a third set of coordinates corresponding to a position of the object in the second set of images. The third set of coordinates corresponding to a position of the object in a real-life environment of the object.
- the method 700 further includes refining the translation of the second set of coordinates to the first set of coordinates based on the third set of coordinates.
- the method 700 includes determining refinement data.
- the refinement data may correspond to the translation data for refining the position of the object, or the points of interest associated with the object, from the second set of coordinates to the first set of coordinates. That is, the refinement data may include one or more data points to translate the second set of coordinates to the first set of coordinates.
- the refinement dataset may be based on a coordinate scheme of the virtual environment. In other embodiments, the refinement dataset may be based on a coordinate scheme of the other computing device such as, for example, computing device at block 710.
- the method 700 may include sending the refinement data to the computing device 710.
- the refinement data may further include other data including, but not limited to, the first set of images, second set of images, first set of coordinates, second set of coordinates, third set of coordinates, training data, other data related to determining the position of the object, or any combinations thereof.
- the term “between” does not necessarily require being disposed directly next to other elements. Generally, this term means a configuration where something is sandwiched by two or more other things. At the same time, the term “between” can describe something that is directly next to two opposing things.
- a particular structural component being disposed between two other structural elements can be: disposed directly between both of the two other structural elements such that the particular structural component is in direct contact with both of the two other structural elements; disposed directly next to only one of the two other structural elements such that the particular structural component is in direct contact with only one of the two other structural elements; disposed indirectly next to only one of the two other structural elements such that the particular structural component is not in direct contact with only one of the two other structural elements, and there is another element which juxtaposes the particular structural component and the one of the two other structural elements; disposed indirectly between both of the two other structural elements such that the particular structural component is not in direct contact with both of the two other structural elements, and other features can be disposed therebetween; or any combination(s) thereof.
- a system for tracking body kinetics comprising: a processor; and a non- transitory computer readable media having stored therein instructions that are executable by the processor to perform operations including: obtain a first dataset corresponding to a first set of images, the first set of images including a moving object; determine a visual environment based on the first set of images; analyze the first set of images to identify and track the moving object in the visual environment; identify points of interest in the first set of images representative of the moving object; and generate a first set of coordinates as output based on a position of the points of interest in the visual environment.
- Clause 2 The system of clause 1 , wherein the operations performed by the processor further comprising: obtain a second dataset corresponding to a second set of images, the second set of images including the moving object; analyze the second set of images to identify and track the moving object in the visual environment; identify points of interest in the second set of images representative of the moving object; and generate a second set of coordinates as output based on the position of the points of interest in the visual environment.
- Clause 3 The system according to any of clauses 1 -2, wherein the operations performed by the processor further comprising: determine a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation, wherein the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
- Clause 4 The system according to any of clauses 1 -3, wherein the operations performed by the processor further comprising: obtain, from a second computing device, a third set of coordinates; wherein the translation is further determined based on the third set of coordinates.
- Clause 5 The system according to any of clauses 1-4, wherein the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the object, the movement of the prosthetic device being captured in the second set of images.
- Clause 6 The system according to any of clauses 1 -5, further comprising: a user input device in communicable connection with the processor.
- Clause 7 The system according to any of clauses 1-6, wherein the user input device comprises a touch-screen interface integral with a display.
- Clause 8 The system according to any of clauses 1-7, wherein the processor is configured to identify the moving object in image data based on regions selected using the user input device.
- Clause 9 The system according to any of clauses 1-8, wherein the processor further performs operations comprising: associate a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determine a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associate a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determine a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
- Clause 10 The system according to any of clauses 1 -9, wherein the processor further performs operations comprising: obtain a training dataset comprising image data corresponding to movement of one or more objects; and train a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images; wherein the processor identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
- Clause 11 The system according to any of clauses 1 -10, wherein the refinement dataset is configured to refine a certain movement of a prosthetic device in a real- life environment of the object so that the certain movement is similar to the moving object in the first set of images. Clause 12.
- a computer-implemented method comprising: obtaining, by a first computing device, a first dataset corresponding to a first set of images, the first set of images including a moving object; determining a visual environment based on the first set of images; identifying and tracking the moving object in the visual environment based on the first set of images; identifying points of interest in the first set of images representative of the moving object; and generating a first set of coordinates as output based on a position of the points of interest in the first set of images.
- Clause 13 The computer-implemented method of clause 12, the method further comprising: obtaining, by the first computing device, a second dataset corresponding to a second set of images, the second set of images including the moving object; identifying and tracking the moving object in the visual environment; identifying points of interest in the second set of images representative of the moving object; generating a second set of coordinates as output based on the position of the points of interest in the second set of images; and determining a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation; wherein the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
- Clause 14 The computer-implemented method according to any of clauses 12-13, the method further comprising: receiving an input at a user input device to enable identifying the points of interest in the first set of images and the second set of images; wherein the points of interest are identified in the first set of images and the second set of images based on the input; and wherein the user input device is in communicable connection with a processor of the first computing device.
- Clause 15 The computer-implemented method according to any of clauses 12-14, wherein the user input device comprises a touch-screen interface integral with a display of the first computing device.
- Clause 16 The computer-implemented method according to any of clauses 12-15, the method further comprising: obtaining, from a second computing device, a third set of coordinates; wherein the translation is further determined based on the third set of coordinates.
- Clause 18 The computer-implemented method according to any of clauses 12-17, the method further comprising: associating a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determining a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associating a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determining a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
- Clause 19 The computer-implemented method according to any of clauses 12-18, the method further comprising: obtaining a training dataset comprising image data corresponding to movement of one or more objects; and training a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images; wherein the first computing device identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
- Clause 20 The computer-implemented method according to any of clauses 12-19, wherein the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the moving object so that a certain movement performed by prosthetic device of the moving object is similar to the moving object in the first set of images.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
Systems and methods for tracking body kinetics including a processor and a non-transitory memory storing therein instructions executable by the processor to perform operations including obtain a first set of images including a moving object, determine a visual environment, identify and track the moving object in the visual environment, identify points of interest in the first set of images representative of the moving object, and generate a first set of coordinates based on a position of the points of interest. The operations may further include obtain a second set of images including the moving object, identify and track the moving object in the visual environment, identify points of interest of the moving object in the second set of images, generate a second set of coordinates based on the position of the points of interest, and determine a translation between the first set of coordinates and the second set of coordinates.
Description
SYSTEMS, DEVICES, AND COMPUTERIZED METHODS FOR TRACKING AND DISPLAYING MOVING OBJECTS ON MOBILE DEVICES
CROSS-REFERENCE TO RELATED APPLICATIONS
[001 ] This application claims priority to U.S. Provisional Application 63/641 ,047 filed on May 1 , 2024, the entire disclosure of which is incorporated herein by reference.
FIELD
[002] The present disclosure relates to systems, devices, and computerized methods for tracking and displaying moving objects on, for example, mobile devices.
BACKGROUND
[003] The tracking of moving objects can be applied towards a broad range of applications, for example, when evaluating an objects response to conditioning based on body and limb positions during a series of movements. When performing such evaluations, a user typically relies on visual inspection and/or manual tracking of body and limb positions in an image, and then comparing the image to other image(s) to identify differences therebetween.
[004] The accuracy, reliability, and functionality of these techniques can be limited due to necessitating a certain amount of human intervention, which may be further limited due to external factors such as, for example, the variability of objects, visual obstructions in images, and limits on viewing angles in the captured images, which can affect the overall performance of performing the motion detection and analysis. In addition, the need for user input or feedback to analyze movement patterns and to make determinations on the significance of variances can render the process highly subjective and the reliability of any determinations made based on such analysis can demonstrate inconsistencies due to internal factors such as, for example, the user’s age, bias, experience, and other such factors.
SUMMARY
[005] In some embodiments, a system for tracking body kinetics includes a processor; and a non-transitory computer readable medium having stored therein instructions that are executable by the processor to perform operations including obtain a first dataset corresponding to a first set of images, the first set of images including a moving object; determine a visual environment based on the first set of images; analyze the first set of images to identify and track the moving object in the visual environment; identify points of interest in the first set of images representative of the moving object; and generate a first set of coordinates as output based on a position of the points of interest in the visual environment.
[006] In some embodiments, the operations performed by the processor further including obtain a second dataset corresponding to a second set of images, the second set of images including the moving object; analyze the second set of images to identify and track the moving object in the visual environment; identify points of interest in the second set of images representative of the moving object; and generate a second set of coordinates as output based on the position of the points of interest in the visual environment.
[007] In some embodiments, the operations performed by the processor further including determine a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation. In some embodiments, the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
[008] In some embodiments, the operations performed by the processor further including obtain, from a second computing device, a third set of coordinates. In some embodiments, the translation is further determined based on the third set of coordinates.
[009] In some embodiments, the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the object, the movement of the prosthetic device being captured in the second set of images.
[0010] In some embodiments, the system further includes a user input device in communicable connection with the processor.
[0011] In some embodiments, the user input device includes a touch-screen interface integral with a display.
[0012] In some embodiments, the processor is configured to identify the moving object in image data based on regions selected using the user input device.
[0013] In some embodiments, the processor further performs operations including associate a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determine a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associate a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determine a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
[0014] In some embodiments, the processor further performs operations including obtain a training dataset including image data corresponding to movement of one or more objects; and train a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images. In some embodiments, the processor identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
[0015] In some embodiments, the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the object so that the certain movement is similar to the moving object in the first set of images.
[0016] In some embodiments, a computer-implemented method includes obtaining, by a first computing device, a first dataset corresponding to a first set of images, the first set of images including a moving object; determining a visual environment based on the first set of images; identifying and tracking the moving object in the visual environment based on the first set of images; identifying points of interest in the first set of images representative of the moving object; and generating a first set of coordinates as output based on a position of the points of interest in the first set of images.
[0017] In some embodiments, the method further including obtaining, by the first computing device, a second dataset corresponding to a second set of images, the second set of images including the moving object; identifying and tracking the moving object in the visual environment; identifying points of interest in the second set of images representative of the moving object; generating a second set of coordinates as output based on the position of the points of interest in the second set of images; and determining a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation. In some embodiments, the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
[0018] In some embodiments, the method further including receiving an input at a user input device to enable identifying the points of interest in the first set of images and the second set of images. In some embodiments, the points of interest are identified in the first set of images and the second set of images based on the input. In some embodiments, the user input device is in communicable connection with a processor of the first computing device.
[0019] In some embodiments, the user input device includes a touch-screen interface integral with a display of the first computing device.
[0020] In some embodiments, the method further including obtaining, from a second computing device, a third set of coordinates. In some embodiments, the translation is further determined based on the third set of coordinates.
[0021] In some embodiments, the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the moving object, the movement of the prosthetic device being captured in the second set of images.
[0022] In some embodiments, the method further including associating a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determining a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associating a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determining a
range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
[0023] In some embodiments, the method further including obtaining a training dataset including image data corresponding to movement of one or more objects; and training a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images. In some embodiments, the first computing device identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
[0024] In some embodiments, the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the moving object so that a certain movement performed by prosthetic device of the moving object is similar to the moving object in the first set of images.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] Some embodiments of the disclosure are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the embodiments shown are by way of example and for purposes of illustrative discussion of embodiments of the disclosure. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the disclosure may be practiced.
[0026] FIG. 1 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
[0027] FIG. 2 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
[0028] FIG. 3 is a flow diagram of a method for training models for performing image processing techniques, according to some embodiments.
[0029] FIG. 4 is a flow diagram of a method for performing the image processing techniques, according to some embodiments.
[0030] FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments.
[0031] FIG. 6 is a flow diagram of a method for performing the image processing techniques, according to some embodiments.
[0032] FIG. 7 is a flow diagram of a method for performing the image processing techniques, according to some embodiments.
DETAILED DESCRIPTION
[0033] Various embodiments of the present disclosure are directed to tracking of moving objects in images in near real-time using computer-based models including, for example, image processing models. The embodiments described herein may be provided in computing devices such as, for example, in mobile computing devices. The computing device may include a processor, a non-transitory computer readable media device such as, for example, a memory or hard-disk drive, and one or more sensor devices for capturing images such as, for example, a camera. According to some embodiments, the computing devices can include instructions stored in the memory such as, for example, software applications executable by the processor to perform operations in accordance with one or more embodiments described herein.
[0034] According to some embodiments, the models utilized in these systems and computing devices may include, for example, machine vision models, machine learning models, Al models, and the like, that can replace vision/recognition tasks traditionally performed by humans or tasks that traditionally needed a certain amount of human input or feedback to process the image processing techniques in real-time. The image processing techniques that may be applied by these models can include, according to some embodiments, processing images or a series of images to identify and track an object or objects in the images. In some embodiments, the image processing techniques applied by the models can include processing the images to identify and track the movement of objects in a visual environment generated based on the images, to identify and track the movement of portions of these objects (e.g., arms, legs, wings, etc.), and to make predictions of a condition of the objects based on the analysis. According to some embodiments, the moving objects captured in these images may be in fields such
as, for example, sports, physical therapy, physical conditioning, medical diagnostics, cell biology, astronomy, ornithology, equestrian sports, and the like. Such objects may include, for example, humans, horses, birds, reptiles, and other like objects. According to some embodiments, the objects captured in the images may be at a cellular level. According to some embodiments, the objects may include astronomical objects, vehicles, automobiles, motorcycles, bicycles, airplanes, drones, and other like objects.
[0035] Systems including these models can apply the image processing techniques to track object(s) in the captured images or video such as, for example, movement of a person including one or more of their limbs including, but not limited to, head, neck, torso, arms, hands, fingers, legs, feet, portions thereof, or any combinations thereof. According to some embodiments, the images may be captured by an external device such as, for example, a handheld DSLR camera, and a computing device may obtain the images from the external device to be processed using the models described herein. The image processing techniques may be used to compare images of an object’s movement either between images or between sets of images, such as between images captured during different points in time to make predictions associated with the object based on the images. For example, the system may process images capturing a person currently undergoing physical therapy performing a sequence of movements to make certain predictions of the type of injury or to provide recommendations for therapeutic exercises that can be performed to help correct for any identified issues.
[0036] Various embodiments of the present disclosure relate to systems, devices, computer-implemented methods, and non-transitory computer readable media for performing the image processing techniques for tracking moving objects in images including tracking body kinetics. According to various embodiments, the techniques can include determining data corresponding to differences in the object’s movement based on a comparison of the images to reference data, and visually displaying the differences between the object’s movement in the images and the reference data. For example, a mobile computing device (e.g., mobile cellular phone) having a camera and a display may be used to capture images of
a set of movements by an object and the mobile computing device may process the images to identify differences between the object’s movement in the images compared to reference data such as, for example, historical image data, and may visually display the differences on the display of the mobile computing device in near real-time. According to some embodiments, the image processing techniques for tracking of moving objects in images may include calculating proportional differences, rotational differences, acceleration differences, speed differences, other like data, or any combinations thereof, between the moving object in the images and the reference data. For example, the image processing techniques may be used to track an athlete’s swing sequence in images captured during different time periods and view differences in the athlete’s swing sequence by displaying the images captured during one time period relative images captured during another time period.
[0037] According to some embodiments, the image processing techniques can include determining the object is demonstrating movements associated with certain physical issues based on comparing the object’s tracked movements from the captured images to reference data and determining data corresponding to differences between the object’s movement in the captured images and the reference data. According to some embodiments, the reference data may include data of similar movements by the same athlete. In some embodiments, the reference data may include historical image data of the same object that is captured in the images being processed. In other embodiments, the reference data may include historical image data of objects that are similar in type to the object captured in the images being processed. Based on the difference data generated based on the object’s tracked movements and based on data corresponding to similar movements of objects from image data in the reference dataset, a prediction associated with the object can be made. For example, the system may provide a prediction, based on the athlete’s tracked movements in the images and based on the processing of those images, that the athlete displays indications of decreased mobility in their leg due to, for example, a pulled hamstring muscle, or some other type of physical issue that affects the athlete’s movement.
[0038] In various embodiments, the image processing techniques for tracking of moving objects in images may include generating and providing display data to a display device, the display data corresponding to visual representations highlighting variances in the object’s movement compared to reference data such as, for example, in historical image data determined based on comparing differences between features extracted from the captured images to the reference dataset. In some embodiments, the reference data may correspond to pre-captured image data of different object types including, for example, data of the certain moving object being analyzed in the images. In this regard, the system may, based on the data corresponding to differences determined between the image data and the reference data, generate and output display data corresponding to the visual representations of real-time highlighting in the captured image or images of the moving object, or may display such data on a display in electronic communication with the system. The highlighting may, according to some embodiments, be indicative of one or more differences including, but not limited to, joint limit, joint speed, joint acceleration, joint distance, object speed, object acceleration, object distance, other indications, or any combinations thereof.
[0039] According to some embodiments, the image processing techniques may be applied to images of a moving object to track the movement of the object, or a portion thereof, based on the images. Tracking the object may include determining one or more points of interest associated with the object based on the movement of the object in the images. Tracking the object may also include determining a location of the points of interest based on the images. In some embodiments, tracking the object includes determining a virtual environment. In addition, tracking the objects can include determining a location of the points of interest in the virtual environment. In some embodiments, the location of the points of interest may include, for example, coordinates along one or more axis including, for example, a first axis, a second axis, a third axis, other axis, or any combinations thereof. The points of interest can include appendages, limbs, torso, joints, other portions of the object, other objects, or any combinations thereof. In some embodiments, the points of interest can include tracklets.
[0040] In some embodiments, the image processing techniques may be applied to a first set of images and a second set of images. The image processing techniques may be applied to the first set of images to determine a first set of coordinates for points of interest in the first set of images. In some embodiments, the first set of images may capture a moving object during a first time period. The image processing techniques may also be applied to the second set of images to determine a second set of coordinates for points of interest in the second set of images. In some embodiments, the second set of images may capture a moving object during a second time period, the second time period occurring after the first time period. In addition, the image processing techniques may include determining differences between the first set of coordinates and the second set of coordinates and determining a refinement dataset, the refinement dataset corresponding to a translation of the second set of coordinates to the first set of coordinates in the virtual environment.
[0041] The refinement dataset may then be utilized to refine a position of the object in a real-life environment of the object from the second set of coordinates to the first set of coordinates. That is, the object movement as captured in the second set of images may be refined to be similar or substantially similar to the object movement in the first set of images. For example, the first set of images may be of a person’s left arm movement performing a first action during a first period of time, and the second set of images may be of the person’s left arm prosthetic movement performing a second action similar during a second period of time, and the image processing techniques may include refinement data corresponding to coordinate data based on a coordinate system scheme associated with the prosthetic to refine the movement of the person’s prosthetic when performing the second action. In this regard, an operation of the prosthetic device may be controlled by a computing device including a processor and a memory having stored therein instructions executable by the processor to enable the prosthetic device to perform movement operations, and the refinement data may be utilized by the computing device to refine the movement of the prosthetic device in the real-life environment of the user.
[0042] In some embodiments, the second set of coordinates may be determined based on the second set of images. In other embodiments, the second set of coordinates may be determined based on the second set of images and based on corresponding coordinate data of the second set of movements from the computing device associated with the prosthetic. In this regard, the translating of the second set of coordinates to the first set of coordinates may be further refined based on the coordinate data from the computing device associated with the prosthetic device. In addition, in some embodiments, the refinement data may be based on a coordinate system of the computing device, a coordinate system of the computing device associated with the prosthetic, or another coordinate system.
[0043] Among those benefits and improvements that have been disclosed, other objects and advantages of this disclosure will become apparent from the following description taken in conjunction with the accompanying figures. Detailed embodiments of the present disclosure are disclosed herein; however, it is to be understood that the disclosed embodiments are merely illustrative of the disclosure that may be embodied in various forms. In addition, each of the examples given regarding the various embodiments of the disclosure which are intended to be illustrative, and not restrictive.
[0044] FIG. 1 is a schematic diagram illustrating a non-limiting example of a system 100, according to some embodiments. System 100 may be a computing device such as, for example, a personal computing device associated with a user. In some embodiments, system 100 may be a mobile computing device such as, for example, a smart cellular telephone, tablet, laptop, personal digital assistant (PDA), augmented reality (AR) device such as a headset, or other like devices.
[0045] System 100 may include side 102 and side 104 opposite the side 102. In FIG. 1 , both the sides 102, 104 of system 100 are shown for simplicity purposes. System 100 may include one or more components therein including processor 106, memory 108, image sensor 110, and display 112. The one or more components of system 100 may be located in housing 114. The image sensor 110 may be located on a side of housing 114. In FIG. 1 , for example, image sensor 110 is shown located on side 104. Although not shown in the figures, in some
embodiments, system 100 may include the image sensor on side 102, image sensor on side 104, or image sensors on both side 102 and side 104. In other embodiments, such as shown in FIG. 2, the system 100 may be in electronic communication with an external image sensor. The display 112 may also be located on a side of housing 114. For example, FIG. 1 shows display 112 located on side 102 of housing 114. In some embodiments, system 100 may include display 112 on side 102, side 104, or on both the sides 102, 104.
[0046] The processor 106 may be a microprocessor, such as that included in a device such as a smart phone. The processor is configured to analyze images from the image sensor 110 to identify moving objects in images including slow-moving objects and fast-moving objects, tracks the identified objects movement between images and determines differences in the objects movement compared to reference data, and then displays image data corresponding to visual representations of the differences. For example, the display may show images from the image sensor with alert frames overlaid on an identified portion or portions of the moving object (e.g., limbs) indicative of differences in the object’s movement compared to reference data. In some embodiments, the processor may be additionally configured to crop images or to mark certain areas of images and exclude those areas from analysis. For example, objects other than the target object may be excluded from analysis. In some embodiments, the processor may receive input from a user input device, for example touch-screen input from the display 112 and based on the input, determines areas to crop out or exclude from analysis. In some embodiments, the types of alerts, including audio components and the shape, color, or effects such as flashing of alert boxes on the display 112 may be selected.
[0047] Memory 108 is a non-transitory computer readable data storage device. The memory 108 can be, for example, a flash memory, magnetic media, optical media, random access memory, etc. The memory 108 is configured to store data corresponding to images captured by the image sensor 110. In some embodiments, the images captured by the image sensor 110 may include additional data such as timestamps, or may be modified by the processor, for example cropping the image
files or marking zones of the image files as excluded from analysis by the processor 106.
[0048] The image sensor 110 may be, for example, a digital camera. The image sensor 110 may capture a series of images over a time period. The image sensor 110 may capture video corresponding to a series of images over the time period. The image sensor 110 may also add additional data to the captured images, such as metadata, for example timestamps. In some embodiments, the image sensor 110 may be an infrared camera or a thermal imaging sensor. In an embodiment, the image sensor 110 is a digital sensor. In an embodiment, the image sensor 110 includes an image stabilization feature. The frame rate and/or the resolution of the image sensor 110 affects a sensory range of the system. A greater frame rate may increase the sensory range of the system. A greater resolution may increase the sensory range of the system.
[0049] Display 112 may be a display device or component which includes light emitting diodes (LED), organic light emitting diodes (OLED), liquid crystal display (LCD), and other like types of display devices. For example, the display 112 can be a component of a smart phone or a tablet device. Display 112 receives processed image data from the processor 106 and displays the processed image data. The display 112 may include a user input feature, for example where the display is a touchscreen device. The input feature may be used, for example, to define regions of the images to exclude from analysis, or to select options and set parameters for those options such as distance and velocity thresholds for alarms or time windows for performing tracking operations.
[0050] Housing 114 may be a metal and/or plastic casing covering the processor 106 and memory 108, and with at least one image sensor 110 disposed on side 104 and the display 112 disposed on side 102. The housing 114, image sensor 110, memory 108, and processor 106 may be, for example, a computing device including a smart phone device.
[0051] FIG. 2 is a schematic diagram illustrating a non-limiting example of a system 200, according to some embodiments. System 200 includes processor 202, memory 204, and multiple image sensors 206, which may be in electronic communicable
connection with each other. The processor 202 and the memory 204 may be located separately from the multiple image sensors 206. System 200 may include a display device for displaying image data. In some embodiments, the display may be located separately from processor 202 and memory 204.
[0052] System 200 may be in electronic communicable connection with one or more other computing devices such as, for example, system 100 in FIG. 1 , and the other computing device may display image data on its display. The processor 202, memory 204, and multiple image sensors 206 may be in electronic communicable connection through one or more different types of electronic connections. In some embodiments, one or all of the processor 202, memory 204, and multiple image sensors 206 may also be in electronic communicable connection with other computing devices such as system 100 through one or more different types of electronic connections. The components of system 200 may be in electronic communicable connection through wired connections and/or wireless connections including, but not limited to, Wi-Fi, Bluetooth, Bluetooth Low Energy (BLE), 5G, 4G, LTE, CDMA, other wireless communication protocols, or any combinations thereof. For example, the wired connections can include USB, ethernet, and the like.
[0053] In some embodiments, the multiple image sensors 206 may include a fixed camera. The camera can be mounted on a vehicle (e.g., land vehicle, water vehicle, or air vehicle). For example, the camera can be mounted to a wired suspension assembly configured to move the camera along one or more axis to capture images of persons moving on a sporting field. In another example, the camera can be mounted on an aerial drone device. In some embodiments, the camera may be portable and located in a housing that may be fixed to another object, such as a static object, for example a tree, or a movable object, for example a helmet. In another example, the camera may be located in a housing that may be hand-held. In some embodiments, the multiple image sensors 16 may be part of a single three- dimensional (3-D) camera such as a stereoscopic camera. In embodiments with a 3-D camera or multiple image sensors, the distance to an object or its approximate size may be determined based on the images from each of the image sensors.
[0054] In some embodiments, there may be an additional user input tool, for example a keyboard and/or mouse in communication with the processor 202, or integrated into components of the system 200, such as a display device similar to display 112 having touch-screen functionality. These user input tools may be used for certain options, such as selecting rules for alarms or notifications, defining areas to exclude from analysis, or activating or deactivating the tracking functionality for particular periods of time, for example disabling object movement tracking and analysis between different intervals. The display may be a two-dimensional display screen such as an LED, LCD, or OLED display. In some embodiments, the display may be a VR device such as a headset, an AR device such as a head-mounted display with a translucent or transparent screen.
[0055] FIG. 3 is a flow diagram of a method 300 for training models for performing image processing techniques, according to some embodiments. At block 302, the method 300 includes establishing a visual environment in images. Establishing the visual environment may include identifying static or non-moving areas that are not tracked and identifying moving objects in the images for tracking. Establishing the visual environment may include classifying objects in the images.
[0056] The visual environment may be the vertical and horizontal area captured by an image sensor such as, for example, image sensor 110 in FIG. 1 . That is, the visual environment may represent the visual range of the image sensor. The visual environment may correspond to the characteristics of images provided to a user via a display such as, for example, display 112 in FIG. 1. The images provided to the user via the display may include, for example, the field of view and the resolution of the displayed image.
[0057] Identified static or non-moving areas (e.g., no movement areas) may be determined by a lack of change in a portion of the visual environment (e.g., pixel or pixels) across multiple frames of the visual environment captured by the image sensor. The aspects of the static or non-moving areas that do not change may include the color, hue, shade or brightness of those areas. Static or non-moving areas may include pixels representative of the ground and fixed features such as, for example, trees, structural members, and the like, which are not moving relative
to the image sensor. For example, the image sensors may be at an elevated fixed position and may capture images of the moving objects and the ground during an application such as during a horse race. Static or non-moving areas do not trigger any alerts or tracking as will be further described herein.
[0058] At block 304, the method 300 includes identifying objects in the visual environment of the images. Identifying the objects may include identifying regions in the image or images corresponding to the background and one or more moving objects in the image. The image may include one or more objects moving in the image or images. The objects may include, for example, first object, second object, and through nth objects captured in an image or series of images during a certain time period. In some embodiments, one or more objects may be captured in images by an image sensor and displayed on a display device such that a target object for tracking may be identified based on one or more inputs received at an input device. For example, the inputs may correspond to a user selection of the target object. In another example, the inputs may correspond to a region of the image including therein at least a portion of the image and the one or more techniques herein can analyze the selected region and identify the target object.
[0059] Identifying the objects in the images may include identifying regions in the image or series of images corresponding to pixels representative of objects, or pixels representative of a target object among one or more objects. For example, a boundary may be identified in the images defining the object, the boundary including the object’s torso and any appendages. Identifying the objects may include classifying objects. For example, the images may capture more than one moving object in the image. In this regard, one or more objects and/or one or more different types of objects including a target object may be identified in the images for tracking. For example, the tracking may be performed on a particular object, e.g., athlete, in a group of other similar objects, e.g., other athletes, in the images.
[0060] In some embodiments, user input may be used to define regions which are excluded from the analysis. Excluded regions may include, for example, areas unlikely to contain an object of interest, for example the ground, areas with an excess of distractors such as a crowd in the background, or areas which are not
useful to the user such as outside a playing surface. The user input for defining the region may be interaction with a display or user interface showing the view of the image sensor, with the input being a touch-screen input such as drawing a region to exclude or using two or more fingers to define the region to be excluded, or selection using a cursor such as one controlled by a mouse, track-pad, joy-stick or other such control interfacing with the processor and the display. The excluded regions may be removed from the portions of the image which are analyzed, may be treated as non-moving or static areas (e.g., no movement areas), or may be treated as distractors. For example, the excluded regions may be cropped out of the image sensor images prior to their processing in establishing the visual environment.
[0061] At block 306, the method 300 includes identifying joints for the target object and associated the joints with the moving object in the visual environment. Identifying the joints may include identifying, based on the region of pixels identified as representing the target moving object, tracklets corresponding to appendages and joints of the object identified based on rendering a frame of the object. For example, joints corresponding to elbows, knees, ankles, and wrists associated with a person being tracked in the images. Referring to FIG. 3, a target object may include, for example, a torso, a first appendage having a first joint associated therewith, a second appendage having a first and second joint associated therewith, and through a nth appendage having a nth joint associated therewith.
[0062] Identifying the set of joints may include identifying, for each image, a set of tracklets based on a frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object or a segment extending between respective joints at the ends of the tracklet.
[0063] In some embodiments, identifying the joints may include determining a 2-D frame of the moving object based on the 2-D images, and rendering a 3-D frame of the object based on applying one or more algorithms to the 2-D frame to map the object in a 3-D coordinate system, the rendering process for generating the 3-D model being capable of compensating for occlusion due to using 2-D images as input.
[0064] At block 308, the method 300 includes determining a pattern of motion of the target object based on a movement of the joints in the visual environment and generating an output dataset corresponding to the pattern of motion of the object. In some embodiments, the pattern of motion may include a joint range of the moving object. Determining the joint range may include tracking a position of the objects joints and determining a pattern of motion of the target object based on determining rotational limits, acceleration, and speed of the joints. This pattern data including the relative joint positions, rotational limits, acceleration, and speed of the joints may be stored in the memory. Determining the joint range may also include, based on the pattern of motion of the joints, determining an overall speed relative the real-life environment, acceleration data, and direction of the moving object. In some embodiments, determining the pattern of motion for the object ay include calculating, between each image of the set of images, movement of the respective set of tracklets between each image frame to determine the pattern of motion for the object.
[0065] Determining the joint range may include identifying areas or regions associated with the object and its appendages in an environment such as, for example, the visual environment, multi-dimensional coordinate system, or some other environment, where a rate of change in one or more aspects is greater than zero or a minimum value. In some embodiments, the rate of change may be determined based on exceeding a threshold value. The aspects which may change in the pixels of the object may include the color, hue, shade, and/or brightness of that area or object. In an embodiment, a determination is made that an object is moving based on whether one or more threshold values for the rate of change of the aspects of that area or object are satisfied. The size of the area of the visual environment where the aspects are changing may also be used to determine whether an object is moving and the rate at which the object or its appendages and joints are moving.
[0066] The threshold values for rate of change and/or the area over which aspects change may, in an embodiment, be one or more predetermined values. The predetermined values may be selected, for example, based on the use case in which this method
is applied, for example using one set of threshold values for an embodiment based on tracking a racehorse performing a sequence of movements, while an embodiment for use in monitoring flight patterns of birds has a different set of predetermined threshold values. In an embodiment, the threshold value is selected (set by) a user via a user interface (e.g., a graphical user interface displayed on a display). These moving areas may include large areas including therein one or more objects, such as in some examples, birds flying in a flock or stars moving across the sky. These objects may, in some embodiments, be user defined, for example through selection of regions via a user interface such as a touch screen. For objects, there may be an initial alert such as a presentation of a box around the object.
[0067] The moving objects are tracked continuously. As each image is processed, the object corresponding to the target object is identified and tracked as the target object moves in the image or in each frame of a series of images. When monitored, objects or areas corresponding to the target object may be identified, and in some embodiments, a tracking symbol such as an alert box may be displayed over the target object.
[0068] At block 310, the method 300 includes recording training data corresponding to movement data generated based on tracking the target object in the images. The movement data may include, but is not limited to, joint position data, pattern data, rotational limits, accelerations, joint speed, joint direction, overall speed, overall accelerations, overall direction, metadata, other movement characteristics, or any combinations thereof. The data may include other types of data including object type classifications, species, gender, classifications of inanimate or non-moving objects, other definitions, or any combinations thereof.
[0069] The training data may be utilized by the one or more techniques to enable performing the object tracking using one or more models as described herein. One or more data points generated based on the tracking may also be combined with the training data to iteratively update the reference data to provide improved object tracking functionality by the one or more techniques and one or more models. In this regard, the models may be trained using the training data and updating the
training data iteratively updates the models to provide improved functionality such as improved object classification, tracking, joint identification, pattern determination, and determination of differences between new images and the training data, as will be further described herein. According to some embodiments, any of the training data is used to train a machine-learning device, system, or both to produce an improved device, system, or both. According to some embodiments, any of the training data is used to train an artificial intelligence (Al) device, system, or both to produce an improved device, system, or both.
[0070] At any of blocks 302, 304, 306, 308, and 310, the method 300 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques at a display device. For example, the display may show the images from the image sensor and including an outline of the target object. In another example, the display may show the images and a frame associated with the target object, and data values corresponding to joint positions, speed, rotation, direction, and the like.
[0071] FIG. 4 is a flow diagram of a method 400 for performing the image processing techniques, according to some embodiments. At block 402, the method 400 includes obtaining an image or obtaining a series of images from an image sensor. The image sensor may correspond to, for example, image sensor 110 in FIG. 1 . In some embodiments, the image sensor may correspond to at least one of the multiple image sensors 206 in FIG. 2. In some embodiments, obtaining the image may include receiving a first dataset comprising image data corresponding to a plurality of images from at least one image sensor.
[0072] At block 402, the images may be captured during a period of time occurring after the period of time the images were processed at block 302. That is, the images and the processing of the images to generate the training data, as shown in FIG. 3, may correspond to historical image data captured and/or processed during a period of time occurring before the time period the images at block 402 were captured.
[0073] At block 404, the method 400 includes analyzing the image data to establishing a visual environment based on the images. The method 400 for establishing the
visual environment may be similar to operations performed at block 302 in FIG. 3, according to some embodiments. A model such as, for example, a computer vision model trained on the training data may be utilized to analyze the images and determine the visual environment based on the images.
[0074] At block 406, the method 400 includes identifying objects in the visual environment of the images, the objects including a target moving object. In some embodiments, the target object may be identified from a plurality of different objects or different types of objects. The method 400 for identifying the objects in the visual environment may be similar to the operations performed at block 304 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identifying the objects in the visual environment.
[0075] In some embodiments, identifying the objects includes classifying the moving object or objects based on a comparison of the target objects to the training dataset. In this regard, the method 400 may include obtaining the training dataset such as, for example, from block 310 in FIG. 3, to perform the object classification.
[0076] At block 408, the method 400 includes identifying joints of the target object in the visual environment of the images. The method 400 for identifying joints of the target object in the visual environment may be similar to operations performed at block 306 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joints of the target object as the object moves in the visual environment of the images. In some embodiments, the method 400 may include identifying a set of skeletal joints corresponding to a body of the target object and corresponding to, for example, a torso and limbs of the target moving object.
[0077] At block 410, the method 400 includes identifying a joint range of the target object in the visual environment of the images. The method 400 for identifying the joint range of the target object in the visual environment may be similar to operations performed at block 308 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may
be utilized to analyze the images and identify the joint range of the target object as the object moves in the visual environment of the images. In some embodiments, identifying the joint range may include tracking one or more factors including, but not limited to, a position, rotation, acceleration, and direction of the set of skeletal joints.
[0078] At block 412, the method 400 includes comparing the data generated based on processing the images at blocks 402, 404, 406, 408, and 410 to the training data. In this regard, the training data may be obtained prior to, or during, block 412. The training data may correspond to, for example, training data generated at block 310 in FIG. 3.
[0079] In some embodiments, comparing the dataset to the training dataset further includes obtaining a set of second tracklets from the training dataset, comparing the set of tracklets determined at block 408, 410 with the set of second tracklets to calculate the differences therebetween.
[0080] At block 414, the method 400 includes identifying differences between the data generated from tracking the target moving object at blocks 402, 404, 406, 408, and 410, relative the training data. These differences may correspond to differences in a relative position of the object in the visual environment, differences in position of the object’s torso and limbs, proportional differences, rotational differences, acceleration differences, speed differences, other differences, or any combinations thereof. Identifying these differences, or deltas, may include performing one or more calculations to identify these differences. In this regard, the training data may include therein corresponding data associated with the target object, one or more different objects similar in type to the target object, one or more different types of objects than the target object, or any combinations thereof. In some embodiments, determining that one or more of the differences between the data corresponding to the pattern of motion relative the training data exceeding a predetermined threshold value or values is indicative of an anomalous movement sequence by the object. In some embodiments, identifying that the differences between the data generated from tracking the target moving object relative the training data exceeds
the predetermined threshold value or values may be indicative of physical issues with the object.
[0081] At any of blocks 402, 404, 406, 408, 410, 412, 414, and 416, the method 400 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques of method 400 at a display device. For example, the display may show images from the image sensor including an outline of the target object and a visual representation of the differences between the object in the images as compared to the training data (e.g., differences in limb position, rotation, speed, directions, etc. In another example, the display may show images obtained at block 402 overlaid with images from the training data and including data values corresponding to joint positions, speed, rotation, direction, and the like. In yet another example, the display may show a graphical user interface configured to receive inputs from a user to enable performing the image processing techniques in accordance with the present disclosure. In some embodiments, displaying image data onto a display device may including displaying a visual indication of differences between the position and direction of the set of skeletal joints of the moving object based on a comparison to the reference dataset.
[0082] FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments. In FIG. 5, images 502 captured by an image sensor or sensors such as, for example, image sensor 110 in FIG. 1 , may be displayed on display device 504. In these embodiments, the x-axis 506 is horizontal with respect to the orientation of the image sensor, the y-axis 508 is vertical with respect to the orientation and position of the image sensor, and the z-axis 510 is the direction from the image sensor to the tracked object. The axes of movement of detected objects are relative to the image captured by the image sensor or sensors. Each frame of the images may show the target object in a respective position such that the series of images may be translated to movement of the target object or movement of portions of the target object. In FIG. 5, the object’s movement corresponds to
movement of the object’s appendages (limbs) and joints relative the x-axis 506, y- axis 508, and z-axis 510.
[0083] The display device 504 shows a visual environment 512 determined based on the images captured by the image sensor or sensors, and a target object 514 identified in the images for tracking. The display device 504 may also show, according to some embodiments, a mapping of a frame 516 and corresponding joints 518 associated with the object 514 in the visual environment 512. The joint range and pattern of movement of object 514 may be determined by calculating the translation of the joints 518 along a multi-axial coordinate system such as, for example, x-axis 506, y-axis 508, and z-axis 510 to determine factors such as, for example, position, proportion, rotation, speed, acceleration, direction, other like factors, or any combinations thereof. This data can then be used as training data to train models to perform the image processing techniques in accordance with the present disclosure, or compared to training data to determine differences in movement of the object 514 as compared to object(s) in the training data.
[0084] FIG. 6 is a flow diagram of a method 600 for performing the image processing techniques, according to some embodiments.
[0085] At block 602, the method 600 includes obtaining an image or a set of images from an image sensor. The image sensor may correspond to, for example, image sensor 110 in FIG. 1 . In some embodiments, the image sensor may correspond to at least one of the multiple image sensors 206 in FIG. 2. In other embodiments, the images may be obtained from another computing device. In some embodiments, obtaining the image may include receiving a dataset comprising image data corresponding to a plurality of images from the at least one image sensor.
[0086] At block 604, the method 600 includes analyzing the image data to establishing a visual environment based on the images. The method 600 for establishing the visual environment may be similar to operations performed at block 302 in FIG. 3, according to some embodiments. A model such as, for example, a computer vision model trained on the training data may be utilized to analyze the images and determine the visual environment based on the images.
[0087] At block 606, the method 600 includes identifying objects in the visual environment of the images, the objects including a target moving object. In some embodiments, the target object may be identified from a plurality of different objects or different types of objects. The method 600 for identifying the objects in the visual environment may be similar to the operations performed at block 304 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identifying the objects in the visual environment.
[0088] In some embodiments, identifying the objects includes classifying the moving object or objects based on a comparison of the target objects to the training dataset. In this regard, the method 600 may include obtaining the training dataset such as, for example, from block 310 in FIG. 3, to perform the object classification. In some embodiments, identifying the object includes identifying one or more points of interest (or regions of interest) associated with the object. These points of interest may include, but is not limited to, a head, neck, torso, arms, legs, hands, fingers, feet, other parts, or combinations thereof. For example, the regions of interest may include a torso, a left arm, and a right arm.
[0089] At block 608, the method 400 includes identifying joints of the target object in the visual environment of the images. The method 600 for identifying joints of the target object in the visual environment may be similar to operations performed at block 306 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joints of the target object as the object moves in the visual environment of the images. In some embodiments, the method 600 may include identifying a set of skeletal joints corresponding to a body of the target object and corresponding to, for example, a torso and limbs of the target moving object.
[0090] At block 610, the method 600 includes identifying a joint range of the target object in the visual environment of the images. The method 600 for identifying the joint range of the target object in the visual environment may be similar to operations performed at block 308 in FIG. 3, according to some embodiments. The model
such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joint range of the target object as the object moves in the visual environment of the images. In some embodiments, identifying the joint range may include tracking one or more factors including, but not limited to, a position, rotation, acceleration, and direction of the set of skeletal joints.
[0091] At block 612, the method 600 includes determining coordinates tracking the objects movement based on the images. That is, coordinates for the one or more points of interest can be tracked as the object moves in the set of images and based on processing of the images at blocks 602, 604, 606, 608, and 610. In some embodiments, the points of interest may also be determined (e.g., identified and classified) based on the training data. The coordinates may correspond to the position of the points of interest in the visual environment at each image. In this regard, the visual environment may be a virtual environment based on a coordinate system and the movement of the object in the images can be tracked using the coordinate scheme of the virtual environment. In some embodiments, the coordinate scheme may be a coordinate scheme of a computing device performing the operations at blocks 602, 604, 606, 608, 610, and 612. In other embodiments, the coordinate scheme may be a coordinate scheme of another computing device.
[0092] In some embodiments, determining the coordinates further includes obtaining a set of second tracklets from the training dataset, comparing the set of tracklets determined at block 608, 610 with the set of second tracklets to calculate the differences therebetween.
[0093] In some embodiments, the method 600 includes identifying differences between the data generated from tracking the target moving object at blocks 602, 604, 606, 608, and 610, relative the training data. These differences may correspond to differences in a relative position of the object in the visual environment, differences in position of the object’s torso and limbs, proportional differences, rotational differences, acceleration differences, speed differences, other differences, or any combinations thereof. Identifying these differences, or deltas, may include performing one or more calculations to identify these differences. In this regard,
the training data may include therein corresponding data associated with the target object, one or more different objects similar in type to the target object, one or more different types of objects than the target object, or any combinations thereof. In some embodiments, determining that one or more of the differences between the data corresponding to the pattern of motion relative the training data exceeding a predetermined threshold value or values is indicative of an anomalous movement sequence by the object. In some embodiments, identifying that the differences between the data generated from tracking the target moving object relative the training data exceeds the predetermined threshold value or values may be indicative of physical issues with the object.
[0094] At any of blocks 602, 604, 606, 608, 610, 612, 614, and 616, the method 600 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques of method 600 at a display device. For example, the display may show images from the image sensor including an outline of the target object and a visual representation of the differences between the object in the images as compared to the training data (e.g., differences in limb position, rotation, speed, directions, etc. In another example, the display may show images obtained at block 602 overlaid with images from the training data and including data values corresponding to joint positions, speed, rotation, direction, and the like. In yet another example, the display may show a graphical user interface configured to receive inputs from a user to enable performing the image processing techniques in accordance with the present disclosure. In some embodiments, displaying image data onto a display device may including displaying a visual indication of differences between the position and direction of the set of skeletal joints of the moving object based on a comparison to the reference dataset.
[0095] FIG. 7 is a flow diagram of a method 700 according to some embodiments. In some embodiments, the method 700 may be an embodiment of blocks 602, 604, 606, 608, 610, and 612 of FIG. 6.
[0096] At block 702, the method 700 includes obtaining a first image data. The first image data may include a first set of images. In some embodiments, the first image data
may include a first set of coordinate data. In other embodiments, the first image data may include other data such as, for example, data generated while performing operations 602, 604, 606, 608, 610, 612, or any combinations thereof, from FIG. 6 on the first set of images. The first image data may correspond to data based on images captured during a first time period. For example, the first image data may include images of a person performing certain movements with the right arm and a first set of coordinate data corresponding to the set of movements.
[0097] At block 704, the method 700 includes obtaining a second image data. The second image data may include a second set of images. In some embodiments, the second image data may include a second set of coordinate data. In other embodiments, the second image data may include other data such as, for example, data generated while performing operations 602, 604, 606, 608, 610, 612, or any combinations thereof, from FIG. 6 on the second set of images.
[0098] The second image data may include images captured during a second time period, the second time period occurring after the first time period. For example, the second image data may include images of the person performing certain movements using a prosthetic device that has replaced their right arm, or a portion thereof.
[0099] At 706, the method 700 includes determining a translation based on the first image data and the second image data. In some embodiments, the first image data includes a first set of coordinates and the second image data includes a second set of coordinates, and determining the translation includes determining a dataset to enable translating the second set of coordinates to the first set of coordinates in the virtual environment. In some embodiments, the translation may be based on a coordinate system of the virtual environment. In other embodiments, the translation may be based on a coordinate system of another computing device. For example, a prosthetic device may include a computing device such as, for example, computing device as shown at block 710, the computing device including a processor and a memory and including instructions for controlling an operation of the prosthetic device including coordinate data for certain movements, and the translation may be based on a coordinate system of the prosthetic device.
[00100] In some embodiments, the method 700 includes obtaining from computing device 710 a third set of coordinates corresponding to a position of the object in the second set of images. The third set of coordinates corresponding to a position of the object in a real-life environment of the object. In this regard, in some embodiments, the method 700 further includes refining the translation of the second set of coordinates to the first set of coordinates based on the third set of coordinates.
[00101 ] At 708, the method 700 includes determining refinement data. In some embodiments, the refinement data may correspond to the translation data for refining the position of the object, or the points of interest associated with the object, from the second set of coordinates to the first set of coordinates. That is, the refinement data may include one or more data points to translate the second set of coordinates to the first set of coordinates. In some embodiments, the refinement dataset may be based on a coordinate scheme of the virtual environment. In other embodiments, the refinement dataset may be based on a coordinate scheme of the other computing device such as, for example, computing device at block 710.
[00102] At 710, the method 700 may include sending the refinement data to the computing device 710. In some embodiments, the refinement data may further include other data including, but not limited to, the first set of images, second set of images, first set of coordinates, second set of coordinates, third set of coordinates, training data, other data related to determining the position of the object, or any combinations thereof.
[00103] All prior patents and publications referenced herein are incorporated by reference in their entireties.
[00104] Throughout the specification and claims, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise. The phrases "in one embodiment," “in an embodiment,” and "in some embodiments" as used herein do not necessarily refer to the same embodiment(s), though it may. Furthermore, the phrases "in another embodiment" and "in some other embodiments" as used herein do not necessarily refer to a different
embodiment, although it may. All embodiments of the disclosure are intended to be combinable without departing from the scope or spirit of the disclosure.
[00105] As used herein, the term "based on" is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise. In addition, throughout the specification, the meaning of "a," "an," and "the" include plural references. The meaning of "in" includes "in" and "on."
[00106] As used herein, the term “between” does not necessarily require being disposed directly next to other elements. Generally, this term means a configuration where something is sandwiched by two or more other things. At the same time, the term “between” can describe something that is directly next to two opposing things. Accordingly, in any one or more of the embodiments disclosed herein, a particular structural component being disposed between two other structural elements can be: disposed directly between both of the two other structural elements such that the particular structural component is in direct contact with both of the two other structural elements; disposed directly next to only one of the two other structural elements such that the particular structural component is in direct contact with only one of the two other structural elements; disposed indirectly next to only one of the two other structural elements such that the particular structural component is not in direct contact with only one of the two other structural elements, and there is another element which juxtaposes the particular structural component and the one of the two other structural elements; disposed indirectly between both of the two other structural elements such that the particular structural component is not in direct contact with both of the two other structural elements, and other features can be disposed therebetween; or any combination(s) thereof.
[00107] The following Clauses provide exemplary embodiments according to the disclosure herein. Any feature(s) in any of the Clause(s) can be combined with any other Clause(s).
Clause 1 . A system for tracking body kinetics comprising: a processor; and a non- transitory computer readable media having stored therein instructions that are executable by the processor to perform operations including: obtain a first dataset corresponding to a first set of images, the first set of images including a moving object; determine a visual environment based on the first set of images; analyze the first set of images to identify and track the moving object in the visual environment; identify points of interest in the first set of images representative of the moving object; and generate a first set of coordinates as output based on a position of the points of interest in the visual environment.
Clause 2. The system of clause 1 , wherein the operations performed by the processor further comprising: obtain a second dataset corresponding to a second set of images, the second set of images including the moving object; analyze the second set of images to identify and track the moving object in the visual environment; identify points of interest in the second set of images representative of the moving object; and generate a second set of coordinates as output based on the position of the points of interest in the visual environment.
Clause 3. The system according to any of clauses 1 -2, wherein the operations performed by the processor further comprising: determine a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation, wherein the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
Clause 4. The system according to any of clauses 1 -3, wherein the operations performed by the processor further comprising: obtain, from a second computing device, a third set of coordinates; wherein the translation is further determined based on the third set of coordinates.
Clause 5. The system according to any of clauses 1-4, wherein the third set of coordinates corresponds to a movement of a prosthetic device in a real-life
environment of the object, the movement of the prosthetic device being captured in the second set of images.
Clause 6. The system according to any of clauses 1 -5, further comprising: a user input device in communicable connection with the processor.
Clause 7. The system according to any of clauses 1-6, wherein the user input device comprises a touch-screen interface integral with a display.
Clause 8. The system according to any of clauses 1-7, wherein the processor is configured to identify the moving object in image data based on regions selected using the user input device.
Clause 9. The system according to any of clauses 1-8, wherein the processor further performs operations comprising: associate a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determine a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associate a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determine a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
Clause 10. The system according to any of clauses 1 -9, wherein the processor further performs operations comprising: obtain a training dataset comprising image data corresponding to movement of one or more objects; and train a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images; wherein the processor identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
Clause 11. The system according to any of clauses 1 -10, wherein the refinement dataset is configured to refine a certain movement of a prosthetic device in a real- life environment of the object so that the certain movement is similar to the moving object in the first set of images.
Clause 12. A computer-implemented method comprising: obtaining, by a first computing device, a first dataset corresponding to a first set of images, the first set of images including a moving object; determining a visual environment based on the first set of images; identifying and tracking the moving object in the visual environment based on the first set of images; identifying points of interest in the first set of images representative of the moving object; and generating a first set of coordinates as output based on a position of the points of interest in the first set of images.
Clause 13. The computer-implemented method of clause 12, the method further comprising: obtaining, by the first computing device, a second dataset corresponding to a second set of images, the second set of images including the moving object; identifying and tracking the moving object in the visual environment; identifying points of interest in the second set of images representative of the moving object; generating a second set of coordinates as output based on the position of the points of interest in the second set of images; and determining a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation; wherein the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
Clause 14. The computer-implemented method according to any of clauses 12-13, the method further comprising: receiving an input at a user input device to enable identifying the points of interest in the first set of images and the second set of images; wherein the points of interest are identified in the first set of images and the second set of images based on the input; and wherein the user input device is in communicable connection with a processor of the first computing device.
Clause 15. The computer-implemented method according to any of clauses 12-14, wherein the user input device comprises a touch-screen interface integral with a display of the first computing device.
Clause 16. The computer-implemented method according to any of clauses 12-15, the method further comprising: obtaining, from a second computing device, a third
set of coordinates; wherein the translation is further determined based on the third set of coordinates.
Clause 17. The computer-implemented method according to any of clauses 12-16, wherein the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the moving object, the movement of the prosthetic device being captured in the second set of images.
Clause 18. The computer-implemented method according to any of clauses 12-17, the method further comprising: associating a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determining a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associating a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determining a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
Clause 19. The computer-implemented method according to any of clauses 12-18, the method further comprising: obtaining a training dataset comprising image data corresponding to movement of one or more objects; and training a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images; wherein the first computing device identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
Clause 20. The computer-implemented method according to any of clauses 12-19, wherein the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the moving object so that a certain movement performed by prosthetic device of the moving object is similar to the moving object in the first set of images.
Claims
1 . A system for tracking body kinetics comprising: a processor; and a non-transitory computer readable media having stored therein instructions that are executable by the processor to perform operations including: obtain a first dataset corresponding to a first set of images, the first set of images including a moving object; determine a visual environment based on the first set of images; analyze the first set of images to identify and track the moving object in the visual environment; identify points of interest in the first set of images representative of the moving object; and generate a first set of coordinates as output based on a position of the points of interest in the visual environment.
2. The system of claim 1 , wherein the operations performed by the processor further comprising: obtain a second dataset corresponding to a second set of images, the second set of images including the moving object; analyze the second set of images to identify and track the moving object in the visual environment; identify points of interest in the second set of images representative of the moving object; and generate a second set of coordinates as output based on the position of the points of interest in the visual environment.
3. The system of claim 2, wherein the operations performed by the processor further comprising:
determine a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation, wherein the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
4. The system of claim 3, wherein the operations performed by the processor further comprising: obtain, from a second computing device, a third set of coordinates; wherein the translation is further determined based on the third set of coordinates.
5. The system of claim 4, wherein the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the object, the movement of the prosthetic device being captured in the second set of images.
6. The system of claim 1 , further comprising: a user input device in communicable connection with the processor.
7. The system of claim 6, wherein the user input device comprises a touch-screen interface integral with a display.
8. The system of claim 7, wherein the processor is configured to identify the moving object in image data based on regions selected using the user input device.
9. The system of claim 2, wherein the processor further performs operations comprising: associate a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determine a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets;
associate a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determine a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
10. The system of claim 9, wherein the processor further performs operations comprising: obtain a training dataset comprising image data corresponding to movement of one or more objects; and train a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images; wherein the processor identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
11 . The system of claim 3, wherein the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the object so that the certain movement is similar to the moving object in the first set of images.
12. A computer-implemented method comprising: obtaining, by a first computing device, a first dataset corresponding to a first set of images, the first set of images including a moving object; determining a visual environment based on the first set of images; identifying and tracking the moving object in the visual environment based on the first set of images; identifying points of interest in the first set of images representative of the moving object; and generating a first set of coordinates as output based on a position of the points of interest in the first set of images.
13. The computer-implemented method of claim 12, the method further comprising:
obtaining, by the first computing device, a second dataset corresponding to a second set of images, the second set of images including the moving object; identifying and tracking the moving object in the visual environment; identifying points of interest in the second set of images representative of the moving object; generating a second set of coordinates as output based on the position of the points of interest in the second set of images; and determining a translation between the first set of coordinates and the second set of coordinates and generate a refinement dataset based on the translation; wherein the refinement dataset corresponds to a translation from the second set of coordinates to the first set of coordinates.
1 . The computer-implemented method of claim 13, the method further comprising: receiving an input at a user input device to enable identifying the points of interest in the first set of images and the second set of images; wherein the points of interest are identified in the first set of images and the second set of images based on the input; and wherein the user input device is in communicable connection with a processor of the first computing device.
15. The computer-implemented method of claim 14, wherein the user input device comprises a touch-screen interface integral with a display of the first computing device.
16. The computer-implemented method of claim 13, the method further comprising: obtaining, from a second computing device, a third set of coordinates; wherein the translation is further determined based on the third set of coordinates.
17. The computer-implemented method of claim 16, wherein the third set of coordinates corresponds to a movement of a prosthetic device in a real-life environment of the
moving object, the movement of the prosthetic device being captured in the second set of images.
18. The computer-implemented method of claim 13, the method further comprising: associating a first set of tracklets with the moving object based on the first set of images, the first set of tracklets corresponding to the points of interest of the moving object; determining a range of motion and a pattern of motion of the moving object based on a location of the first set of tracklets; associating a second set of tracklets with the moving object based on the second set of images, the second set of tracklets corresponding to the points of interest of the moving object; and determining a range of motion and a pattern of motion of the moving object based on the location of the second set of tracklets.
19. The computer-implemented method of claim 18, the method further comprising: obtaining a training dataset comprising image data corresponding to movement of one or more objects; and training a machine learning model based on the training dataset to enable determining sets of coordinates corresponding to the moving object in images; wherein the first computing device identifies the moving object, the range of motion, and the pattern of motion of the moving object based on the training dataset.
20. The computer-implemented method of claim 13, wherein the refinement dataset is configured to refine a certain movement of a prosthetic device in a real-life environment of the moving object so that a certain movement performed by prosthetic device of the moving object is similar to the moving object in the first set of images.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463641047P | 2024-05-01 | 2024-05-01 | |
| US63/641,047 | 2024-05-01 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025231080A1 true WO2025231080A1 (en) | 2025-11-06 |
Family
ID=97562157
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2025/026999 Pending WO2025231080A1 (en) | 2024-05-01 | 2025-04-30 | Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025231080A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200272148A1 (en) * | 2019-02-21 | 2020-08-27 | Zoox, Inc. | Motion prediction based on appearance |
| KR20220050018A (en) * | 2020-10-15 | 2022-04-22 | 한국전자통신연구원 | Image based behavior recognition system for rehabilitation patient and method thereof |
| US20220375119A1 (en) * | 2021-05-07 | 2022-11-24 | Ncsoft Corporation | Electronic device, method, and computer readable storage medium for obtaining video sequence including visual object with postures of body independently from movement of camera |
| US20230300281A1 (en) * | 2022-03-18 | 2023-09-21 | Ncsoft Corporation | Electronic device, method, and computer readable recording medium for synchronizing videos based on movement of body |
| US20240119353A1 (en) * | 2022-10-07 | 2024-04-11 | Toyota Jidosha Kabushiki Kaisha | Training data generation method and training data generation system |
-
2025
- 2025-04-30 WO PCT/US2025/026999 patent/WO2025231080A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200272148A1 (en) * | 2019-02-21 | 2020-08-27 | Zoox, Inc. | Motion prediction based on appearance |
| KR20220050018A (en) * | 2020-10-15 | 2022-04-22 | 한국전자통신연구원 | Image based behavior recognition system for rehabilitation patient and method thereof |
| US20220375119A1 (en) * | 2021-05-07 | 2022-11-24 | Ncsoft Corporation | Electronic device, method, and computer readable storage medium for obtaining video sequence including visual object with postures of body independently from movement of camera |
| US20230300281A1 (en) * | 2022-03-18 | 2023-09-21 | Ncsoft Corporation | Electronic device, method, and computer readable recording medium for synchronizing videos based on movement of body |
| US20240119353A1 (en) * | 2022-10-07 | 2024-04-11 | Toyota Jidosha Kabushiki Kaisha | Training data generation method and training data generation system |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10671156B2 (en) | Electronic apparatus operated by head movement and operation method thereof | |
| KR102377561B1 (en) | Apparatus and method for providing taekwondo movement coaching service using mirror dispaly | |
| CN107909061B (en) | A head attitude tracking device and method based on incomplete features | |
| JP7011608B2 (en) | Posture estimation in 3D space | |
| US9135508B2 (en) | Enhanced user eye gaze estimation | |
| CN109176512A (en) | A kind of method, robot and the control device of motion sensing control robot | |
| EP3379396A1 (en) | Method for acting on augmented reality virtual objects | |
| KR20200044835A (en) | Detailed eye shape model for robust biometric applications | |
| Papic et al. | Improving data acquisition speed and accuracy in sport using neural networks | |
| CN112927259A (en) | Multi-camera-based bare hand tracking display method, device and system | |
| WO2017116814A1 (en) | Calibrating object shape | |
| Amrutha et al. | Human body pose estimation and applications | |
| CN113419623A (en) | Non-calibration eye movement interaction method and device | |
| US20180182166A1 (en) | Tracking rigged polygon-mesh models of articulated objects | |
| CN106485207A (en) | A kind of Fingertip Detection based on binocular vision image and system | |
| Galindo et al. | Landmark based eye ratio estimation for driver fatigue detection | |
| KR101447958B1 (en) | Method and apparatus for recognizing body point | |
| CN105225270A (en) | A kind of information processing method and electronic equipment | |
| CN108027647B (en) | Method and apparatus for interacting with virtual objects | |
| CN110910426A (en) | Action process and action trend identification method, storage medium and electronic device | |
| WO2025231080A1 (en) | Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices | |
| CN120431178A (en) | A high-precision 3D posture acquisition and reconstruction system for sports training | |
| WO2025231077A1 (en) | Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices | |
| CN110992391B (en) | Method and computer readable medium for 3D detection and tracking pipeline recommendation | |
| Cui et al. | Trajectory simulation of badminton robot based on fractal brown motion |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25798611 Country of ref document: EP Kind code of ref document: A1 |