[go: up one dir, main page]

WO2025231077A1 - Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices - Google Patents

Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices

Info

Publication number
WO2025231077A1
WO2025231077A1 PCT/US2025/026991 US2025026991W WO2025231077A1 WO 2025231077 A1 WO2025231077 A1 WO 2025231077A1 US 2025026991 W US2025026991 W US 2025026991W WO 2025231077 A1 WO2025231077 A1 WO 2025231077A1
Authority
WO
WIPO (PCT)
Prior art keywords
dataset
images
visual environment
identifying
differences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/026991
Other languages
French (fr)
Inventor
Christopher Francis EBBERT
John Walter Perry
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2025231077A1 publication Critical patent/WO2025231077A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the present disclosure relates to systems, devices, and computerized methods for tracking and displaying moving objects on, for example, mobile devices.
  • the tracking of moving objects can be applied towards a broad range of applications, for example, when evaluating an objects response to conditioning based on body and limb positions during a series of movements.
  • a user When performing such evaluations, a user typically relies on visual inspection and/or manual tracking of body and limb positions in an image, and then comparing the image to other image(s) to identify differences therebetween.
  • the accuracy, reliability, and functionality of these techniques may be limited due to necessitating a certain amount of human intervention, which may be further limited due to external factors such as, for example, the variability of objects, visual obstructions in images, and limits on viewing angles in the captured images, which can affect the overall performance of performing the motion detection and analysis.
  • external factors such as, for example, the variability of objects, visual obstructions in images, and limits on viewing angles in the captured images, which can affect the overall performance of performing the motion detection and analysis.
  • the need for user input or feedback to analyze movement patterns and to make determinations on the significance of variances can render the process highly subjective and the reliability of any determinations made based on such analysis can demonstrate inconsistencies due to internal factors such as, for example, the user’s age, bias, experience, and other such factors.
  • a system includes a processor and a non-transitory computer readable medium having stored therein instructions that are executable by the processor to perform operations for tracking object movement, the operations including: establish a visual environment based on a set of images; identify an object in the visual environment; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset; identify differences between the dataset and the training dataset; and display a visual representation of the differences between the dataset and the training dataset at a display device.
  • the processor further performs operations including receive image data captured by an image sensor, the image data corresponding to the set of images of a scene including the object.
  • the processor further performs operations including comparing the differences to a reference dataset and to predict a physical issue with the object.
  • the system further includes an image sensor.
  • the image sensor captures the set of images.
  • the system further includes a display device.
  • the display device displays the set of images.
  • the display device further includes a touch-screen interface integral with the display device configured to receive one or more inputs from a user.
  • the system further includes a user input device configured to receive one or more inputs from a user.
  • identifying the object in the visual environment further including identifying a first region in the visual environment corresponding to pixels representative of the object. In some embodiments, the first region is identified based on an input received from a user corresponding to the object for tracking. [0012] In some embodiments, identifying the object in the visual environment further including identifying a second region in the visual environment corresponding to pixels representative of other objects; and identifying a third region in the visual environment corresponding to pixels representative of a background.
  • identifying the set of joints further including: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
  • comparing the dataset to the training dataset further including: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object.
  • the differences exceeding the predetermined threshold is indicative of physical issues with the object.
  • the processor further performs operations including: train a model using training data to enable improved tracking of objects based on the set of images, the training data including the dataset corresponding to the pattern of motion of the object.
  • a computer-implemented method including: establishing a visual environment based on a set of images; identifying an object in the visual environment; identifying a set of joints for the object and associate the set of joints with the object in the visual environment; determining a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; and generating a training dataset including data corresponding to the pattern of motion of the object and training a model using the training dataset to enable improved tracking of objects based on captured images.
  • the method further including displaying the dataset corresponding to the pattern of motion of the object at a display device to provide a visual representation of the pattern of motion.
  • the training dataset further includes one or more physical issues associated with the patterns of motion.
  • identifying the object in the visual environment further including identifying a first region in the images corresponding to pixels representative of the object.
  • the first region is identified based on an input received from a user corresponding to the object for tracking.
  • identifying the object in the visual environment further including identifying a second region in the images corresponding to pixels representative of other objects, and identifying a third region in the images corresponding to pixels representative of a background.
  • identifying the set of joints further including: identifying, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculating, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
  • a non-transitory computer readable medium having stored therein instructions executable by a processor to perform operations for tracking moving objects in a set of images captured by an image sensor, the operations including: receive image data from the image sensor, the image data corresponding to a set of images including an object; establish a visual environment based on the set of images; identifying a first region in the visual environment corresponding to pixels representative of the object; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset and identify differences between the dataset and the training dataset; compare the differences to a reference dataset and to predict a physical issue with the object; display a visual representation of the differences between the dataset and the training dataset at a display device; and train a model using training data to enable improved tracking of objects based on the set of images, the training data including the
  • identifying the set of joints further including: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, a movement of the respective set of tracklets to determine the pattern of motion for the object.
  • comparing the dataset to the training dataset further including: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object.
  • the differences exceeding the predetermined thresholds is indicative of physical issues with the object.
  • FIG. 1 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
  • FIG. 2 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
  • FIG. 3 is an exemplary flow diagram of a method for training models for performing image processing techniques, according to some embodiments.
  • FIG. 4 is an exemplary flow diagram of a method for performing the image processing techniques, according to some embodiments.
  • FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments.
  • Various embodiments of the present disclosure are directed to tracking of moving objects in images in near real-time using computer-based models including, for example, image processing models.
  • the advancements described herein may be provided in computing devices such as, for example, in mobile computing devices.
  • the computing device may include a processor, a non-transitory computer readable media device such as, for example, a memory or hard-disk drive, and one or more sensor devices for capturing images such as, for example, a camera.
  • the computing devices can include instructions stored in the memory such as, for example, software applications executable by the processor to perform the various embodiments described herein.
  • the models utilized in these systems and computing devices may include, for example, machine vision models that can replace vision/recognition tasks traditionally performed by humans or tasks that traditionally needed a certain amount of human input or feedback to process the image processing techniques in real-time.
  • the image processing techniques that may be applied by these models can include, according to some embodiments, processing images or a series of images to identify and track an object or objects in the images.
  • the image processing techniques applied by the models can include processing the images to identify and track the movement of objects in a visual environment generated based on the images, to identify and track the movement of portions of these objects (e.g., arms, legs, wings, etc.), and to make predictions of a condition of the objects based on the analysis.
  • the moving objects captured in these images may be in fields such as, for example, sports, physical therapy, physical conditioning, medical diagnostics, cell biology, astronomy, ornithology, equestrian sports, and the like.
  • Such objects may include, for example, humans, horses, birds, reptiles, and other like objects.
  • the objects captured in the images may be at a cellular level.
  • the objects may include astronomical objects, vehicles, automobiles, motorcycles, bicycles, airplanes, drones, and other like objects.
  • Systems including these models can apply the image processing techniques to track object(s) in the captured images or video such as, for example, an athlete’s movements.
  • the images may be captured by an external device such as, for example, a handheld DSLR camera, and a computing device may obtain the images from the external device to be processed using the models described herein.
  • the image processing techniques may be used to compare images of an object’s movement either between images or between sets of images, such as between images captured during different points in time to make predictions associated with the object based on the images.
  • the system may process images capturing a person currently undergoing physical therapy performing a sequence of movements to make certain predictions of the type of injury or to provide recommendations for therapeutic exercises that can be performed to help correct for any identified issues.
  • the techniques can include determining data corresponding to differences in the object’s movement based on a comparison of the images to reference data, and visually displaying the differences between the object’s movement in the images and the reference data.
  • a mobile computing device e.g., mobile cellular phone
  • the mobile computing device may process the images to identify differences between the horse’s movement in the images compared to historical reference data and may visually display the differences on the display of the mobile computing device in near realtime.
  • the image processing techniques for tracking of moving objects in images may include calculating proportional differences, rotational differences, acceleration differences, speed differences, other like data, or any combinations thereof, between the moving object in the images and the reference data.
  • a user may utilize these image processing techniques to track an athlete’s swing sequence in images captured during different time periods and view differences in the athlete’s swing sequence by displaying the images captured during one time period relative images captured during another time period.
  • the image processing techniques can include determining the object is demonstrating movements associated with certain physical issues based on comparing the object’s tracked movements from the captured images to reference data and determining data corresponding to differences between the object’s movement in the captured images and the reference data.
  • the reference data may include data of similar movements by the same athlete.
  • the reference data may include historical image data of the same object that is captured in the images being processed.
  • the reference data may include historical image data of objects that are similar in type to the object captured in the images being processed. Based on the difference data generated based on the object’s tracked movements and based on data corresponding to similar movements of objects from image data in the reference dataset, a prediction associated with the object can be made.
  • the system may provide a prediction, based on the athlete’s tracked movements in the images and based on the processing of those images, that the athlete displays indications of decreased mobility in their leg due to, for example, a pulled hamstring muscle, or some other type of physical issue that affects the athlete’s movement.
  • the image processing techniques for tracking of moving objects in images may include generating and providing display data to a display device, the display data corresponding to visual representations highlighting variances in the object’s movement compared to reference data such as, for example, in historical image data determined based on comparing differences between features extracted from the captured images to the reference dataset.
  • the reference data may correspond to pre-captured image data of different object types including, for example, data of the certain moving object being analyzed in the images.
  • the system may, based on the data corresponding to differences determined between the image data and the reference data, generate and output display data corresponding to the visual representations of real-time highlighting in the captured image or images of the moving object, or may display such data on a display in electronic communication with the system.
  • the highlighting may, according to some embodiments, be indicative of one or more differences including, but not limited to, joint limit, joint speed, joint acceleration, joint distance, object speed, object acceleration, object distance, other indications, or any combinations thereof.
  • FIG. 1 is a schematic diagram illustrating a non-limiting example of a system 100, according to some embodiments.
  • System 100 may be a computing device such as, for example, a personal computing device associated with a user.
  • system 100 may be a mobile computing device such as, for example, a smart cellular telephone, tablet, laptop, personal digital assistant (PDA), augmented reality (AR) device such as a headset, or other like devices.
  • PDA personal digital assistant
  • AR augmented reality
  • System 100 may include side 102 and side 104 opposite the side 102. In FIG. 1 , both the sides 102, 104 of system 100 are shown for simplicity purposes.
  • System 100 may include one or more components therein including processor 106, memory 108, image sensor 110, and display 112. The one or more components of system 100 may be located in housing 114.
  • the image sensor 110 may be located on a side of housing 114. In FIG. 1 , for example, image sensor 110 is shown located on side 104.
  • system 100 may include the image sensor on side 102, image sensor on side 104, or image sensors on both side 102 and side 104. In other embodiments, such as shown in FIG.
  • the system 100 may be in electronic communication with an external image sensor.
  • the display 112 may also be located on a side of housing 114.
  • FIG. 1 shows display 112 located on side 102 of housing 114.
  • system 100 may include display 112 on side 102, side 104, or on both the sides 102, 104.
  • the processor 106 may be a microprocessor, such as that included in a device such as a smart phone.
  • the processor is configured to analyze images from the image sensor 110 to identify moving objects in images including slow-moving objects and fast-moving objects, tracks the identified objects movement between images and determines differences in the objects movement compared to reference data, and then displays image data corresponding to visual representations of the differences.
  • the display may show images from the image sensor with alert frames overlaid on an identified portion or portions of the moving object (e.g., limbs) indicative of differences in the object’s movement compared to reference data.
  • the processor may be additionally configured to crop images or to mark certain areas of images and exclude those areas from analysis.
  • the processor may receive input from a user input device, for example touch-screen input from the display 112 and based on the input, determines areas to crop out or exclude from analysis.
  • the types of alerts including audio components and the shape, color, or effects such as flashing of alert boxes on the display 112 may be selected.
  • Memory 108 is a non-transitory computer readable data storage device.
  • the memory 108 can be, for example, a flash memory, magnetic media, optical media, random access memory, etc.
  • the memory 108 is configured to store data corresponding to images captured by the image sensor 110.
  • the images captured by the image sensor 110 may include additional data such as timestamps, or may be modified by the processor, for example cropping the image files or marking zones of the image files as excluded from analysis by the processor 106.
  • the image sensor 110 may be, for example, a digital camera.
  • the image sensor 110 may capture a series of images over a time period.
  • the image sensor 110 may capture video corresponding to a series of images over the time period.
  • the image sensor 110 may also add additional data to the captured images, such as metadata, for example timestamps.
  • the image sensor 110 may be an infrared camera or a thermal imaging sensor.
  • the image sensor 110 is a digital sensor.
  • the image sensor 110 includes an image stabilization feature.
  • the frame rate and/or the resolution of the image sensor 110 affects a sensory range of the system. A greater frame rate may increase the sensory range of the system. A greater resolution may increase the sensory range of the system.
  • Display 112 may be a display device or component which includes light emitting diodes (LED), organic light emitting diodes (OLED), liquid crystal display (LCD), and other like types of display devices.
  • the display 112 can be a component of a smart phone or a tablet device.
  • Display 112 receives processed image data from the processor 106 and displays the processed image data.
  • the display 112 may include a user input feature, for example where the display is a touchscreen device.
  • the input feature may be used, for example, to define regions of the images to exclude from analysis, or to select options and set parameters for those options such as distance and velocity thresholds for alarms or time windows for performing tracking operations.
  • Housing 114 may be a metal and/or plastic casing covering the processor 106 and memory 108, and with at least one image sensor 110 disposed on side 104 and the display 112 disposed on side 102.
  • the housing 114, image sensor 110, memory 108, and processor 106 may be, for example, a computing device including a smart phone device.
  • FIG. 2 is a schematic diagram illustrating a non-limiting example of a system 200, according to some embodiments.
  • System 200 includes processor 202, memory 204, and multiple image sensors 206, which may be in electronic communicable connection with each other.
  • the processor 202 and the memory 204 may be located separately from the multiple image sensors 206.
  • System 200 may include a display device for displaying image data. In some embodiments, the display may be located separately from processor 202 and memory 204.
  • System 200 may be in electronic communicable connection with one or more other computing devices such as, for example, system 100 in FIG. 1 , and the other computing device may display image data on its display.
  • the processor 202, memory 204, and multiple image sensors 206 may be in electronic communicable connection through one or more different types of electronic connections.
  • one or all of the processor 202, memory 204, and multiple image sensors 206 may also be in electronic communicable connection with other computing devices such as system 100 through one or more different types of electronic connections.
  • the components of system 200 may be in electronic communicable connection through wired connections and/or wireless connections including, but not limited to, Wi-Fi, Bluetooth, Bluetooth Low Energy (BLE), 5G, 4G, LTE, CDMA, other wireless communication protocols, or any combinations thereof.
  • the wired connections can include USB, ethernet, and the like.
  • the multiple image sensors 206 may include a fixed camera.
  • the camera can be mounted on a vehicle (e.g., land vehicle, water vehicle, or air vehicle).
  • the camera can be mounted to a wired suspension assembly configured to move the camera along one or more axis to capture images of persons moving on a sporting field.
  • the camera can be mounted on an aerial drone device.
  • the camera may be portable and located in a housing that may be fixed to another object, such as a static object, for example a tree, or a movable object, for example a helmet.
  • the camera may be located in a housing that may be hand-held.
  • the multiple image sensors 16 may be part of a single three- dimensional (3-D) camera such as a stereoscopic camera.
  • 3-D three- dimensional
  • the distance to an object or its approximate size may be determined based on the images from each of the image sensors.
  • an additional user input tool for example a keyboard and/or mouse in communication with the processor 202, or integrated into components of the system 200, such as a display device similar to display 112 having touch-screen functionality.
  • These user input tools may be used for certain options, such as selecting rules for alarms or notifications, defining areas to exclude from analysis, or activating or deactivating the tracking functionality for particular periods of time, for example disabling object movement tracking and analysis between different intervals.
  • the display may be a two-dimensional display screen such as an LED, LCD, or OLED display.
  • the display may be a VR device such as a headset, an AR device such as a head-mounted display with a translucent or transparent screen.
  • FIG. 3 is a flow diagram of a method 300 for training models for performing image processing techniques, according to some embodiments.
  • the method 300 includes establishing a visual environment in images.
  • Establishing the visual environment may include identifying static or non-moving areas that are not tracked and identifying moving objects in the images for tracking.
  • Establishing the visual environment may include classifying objects in the images.
  • the visual environment may be the vertical and horizontal area captured by an image sensor such as, for example, image sensor 110 in FIG. 1 . That is, the visual environment may represent the visual range of the image sensor.
  • the visual environment may correspond to the characteristics of images provided to a user via a display such as, for example, display 112 in FIG. 1.
  • the images provided to the user via the display may include, for example, the field of view and the resolution of the displayed image.
  • Identified static or non-moving areas may be determined by a lack of change in a portion of the visual environment (e.g., pixel or pixels) across multiple frames of the visual environment captured by the image sensor.
  • the aspects of the static or non-moving areas that do not change may include the color, hue, shade or brightness of those areas.
  • Static or non-moving areas may include pixels representative of the ground and fixed features such as, for example, trees, structural members, and the like, which are not moving relative to the image sensor.
  • the image sensors may be at an elevated fixed position and may capture images of the moving objects and the ground during an application such as during a horse race. Static or non-moving areas do not trigger any alerts or tracking as will be further described herein.
  • the method 300 includes identifying objects in the visual environment of the images. Identifying the objects may include identifying regions in the image or images corresponding to the background and one or more moving objects in the image.
  • the image may include one or more objects moving in the image or images.
  • the objects may include, for example, first object, second object, and through nth objects captured in an image or series of images during a certain time period.
  • one or more objects may be captured in images by an image sensor and displayed on a display device such that a target object for tracking may be identified based on one or more inputs received at an input device.
  • the inputs may correspond to a user selection of the target object.
  • the inputs may correspond to a region of the image including therein at least a portion of the image and the one or more techniques herein can analyze the selected region and identify the target object.
  • Identifying the objects in the images may include identifying regions in the image or series of images corresponding to pixels representative of objects, or pixels representative of a target object among one or more objects. For example, a boundary may be identified in the images defining the object, the boundary including the object’s torso and any appendages. Identifying the objects may include classifying objects. For example, the images may capture more than one moving object in the image. In this regard, one or more objects and/or one or more different types of objects including a target object may be identified in the images for tracking. For example, the tracking may be performed on a particular object, e.g., athlete, in a group of other similar objects, e.g., other athletes, in the images.
  • user input may be used to define regions which are excluded from the analysis. Excluded regions may include, for example, areas unlikely to contain an object of interest, for example the ground, areas with an excess of distractors such as a crowd in the background, or areas which are not useful to the user such as outside a playing surface.
  • the user input for defining the region may be interaction with a display or user interface showing the view of the image sensor, with the input being a touch-screen input such as drawing a region to exclude or using two or more fingers to define the region to be excluded, or selection using a cursor such as one controlled by a mouse, track-pad, joy-stick or other such control interfacing with the processor and the display.
  • the excluded regions may be removed from the portions of the image which are analyzed, may be treated as non-moving or static areas (e.g., no movement areas), or may be treated as distractors.
  • the excluded regions may be cropped out of the image sensor images prior to their processing in establishing the visual environment.
  • the method 300 includes identifying joints for the target object and associated the joints with the moving object in the visual environment. Identifying the joints may include identifying, based on the region of pixels identified as representing the target moving object, tracklets corresponding to appendages and joints of the object identified based on rendering a frame of the object. For example, joints corresponding to elbows, knees, ankles, and wrists associated with a person being tracked in the images.
  • a target object may include, for example, a torso, a first appendage having a first joint associated therewith, a second appendage having a first and second joint associated therewith, and through a nth appendage having a nth joint associated therewith.
  • Identifying the set of joints may include identifying, for each image, a set of tracklets based on a frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object or a segment extending between respective joints at the ends of the tracklet.
  • identifying the joints may include determining a 2-D frame of the moving object based on the 2-D images, and rendering a 3-D frame of the object based on applying one or more algorithms to the 2-D frame to map the object in a 3-D coordinate system, the rendering process for generating the 3-D model being capable of compensating for occlusion due to using 2-D images as input.
  • the method 300 includes determining a pattern of motion of the target object based on a movement of the joints in the visual environment and generating an output dataset corresponding to the pattern of motion of the object.
  • the pattern of motion may include a joint range of the moving object. Determining the joint range may include tracking a position of the objects joints and determining a pattern of motion of the target object based on determining rotational limits, acceleration, and speed of the joints. This pattern data including the relative joint positions, rotational limits, acceleration, and speed of the joints may be stored in the memory. Determining the joint range may also include, based on the pattern of motion of the joints, determining an overall speed relative the real-life environment, acceleration data, and direction of the moving object. In some embodiments, determining the pattern of motion for the object ay include calculating, between each image of the set of images, movement of the respective set of tracklets between each image frame to determine the pattern of motion for the object.
  • Determining the joint range may include identifying areas or regions associated with the object and its appendages in an environment such as, for example, the visual environment, multi-dimensional coordinate system, or some other environment, where a rate of change in one or more aspects is greater than zero or a minimum value. In some embodiments, the rate of change may be determined based on exceeding a threshold value.
  • the aspects which may change in the pixels of the object may include the color, hue, shade, and/or brightness of that area or object. In an embodiment, a determination is made that an object is moving based on whether one or more threshold values for the rate of change of the aspects of that area or object are satisfied.
  • the size of the area of the visual environment where the aspects are changing may also be used to determine whether an object is moving and the rate at which the object or its appendages and joints are moving.
  • the threshold values for rate of change and/or the area over which aspects change may, in an embodiment, be one or more predetermined values.
  • the predetermined values may be selected, for example, based on the use case in which this method is applied, for example using one set of threshold values for an embodiment based on tracking a racehorse performing a sequence of movements, while an embodiment for use in monitoring flight patterns of birds has a different set of predetermined threshold values.
  • the threshold value is selected (set by) a user via a user interface (e.g., a graphical user interface displayed on a display).
  • These moving areas may include large areas including therein one or more objects, such as in some examples, birds flying in a flock or stars moving across the sky.
  • objects may, in some embodiments, be user defined, for example through selection of regions via a user interface such as a touch screen.
  • a user interface such as a touch screen.
  • there may be an initial alert such as a presentation of a box around the object.
  • the moving objects are tracked continuously. As each image is processed, the object corresponding to the target object is identified and tracked as the target object moves in the image or in each frame of a series of images. When monitored, objects or areas corresponding to the target object may be identified, and in some embodiments, a tracking symbol such as an alert box may be displayed over the target object.
  • the method 300 includes recording training data corresponding to movement data generated based on tracking the target object in the images.
  • the movement data may include, but is not limited to, joint position data, pattern data, rotational limits, accelerations, joint speed, joint direction, overall speed, overall accelerations, overall direction, metadata, other movement characteristics, or any combinations thereof.
  • the data may include other types of data including object type classifications, species, gender, classifications of inanimate or non-moving objects, other definitions, or any combinations thereof.
  • the training data may be utilized by the one or more techniques to enable performing the object tracking using one or more models as described herein.
  • One or more data points generated based on the tracking may also be combined with the training data to iteratively update the reference data to provide improved object tracking functionality by the one or more techniques and one or more models.
  • the models may be trained using the training data and updating the training data iteratively updates the models to provide improved functionality such as improved object classification, tracking, joint identification, pattern determination, and determination of differences between new images and the training data, as will be further described herein.
  • any of the training data is used to train a machine-learning device, system, or both to produce an improved device, system, or both.
  • any of the training data is used to train an artificial intelligence (Al) device, system, or both to produce an improved device, system, or both.
  • the training data is not used to train an Al device, system, or both.
  • the method 300 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques at a display device.
  • the display may show the images from the image sensor and including an outline of the target object.
  • the display may show the images and a frame associated with the target object, and data values corresponding to joint positions, speed, rotation, direction, and the like.
  • FIG. 4 is a flow diagram of a method 400 for performing the image processing techniques, according to some embodiments.
  • the method 400 includes obtaining an image or obtaining a series of images from an image sensor.
  • the image sensor may correspond to, for example, image sensor 110 in FIG. 1 .
  • the image sensor may correspond to at least one of the multiple image sensors 206 in FIG. 2.
  • obtaining the image may include receiving a first dataset comprising image data corresponding to a plurality of images from at least one image sensor.
  • the images may be captured during a period of time occurring after the period of time the images were processed at block 302. That is, the images and the processing of the images to generate the training data, as shown in FIG. 3, may correspond to historical image data captured and/or processed during a period of time occurring before the time period the images at block 402 were captured.
  • the method 400 includes analyzing the image data to establishing a visual environment based on the images.
  • the method 400 for establishing the visual environment may be similar to operations performed at block 302 in FIG. 3, according to some embodiments.
  • a model such as, for example, a computer vision model trained on the training data may be utilized to analyze the images and determine the visual environment based on the images.
  • the method 400 includes identifying objects in the visual environment of the images, the objects including a target moving object.
  • the target object may be identified from a plurality of different objects or different types of objects.
  • the method 400 for identifying the objects in the visual environment may be similar to the operations performed at block 304 in FIG. 3, according to some embodiments.
  • the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identifying the objects in the visual environment.
  • identifying the objects includes classifying the moving object or objects based on a comparison of the target objects to the training dataset.
  • the method 400 may include obtaining the training dataset such as, for example, from block 310 in FIG. 3, to perform the object classification.
  • the method 400 includes identifying joints of the target object in the visual environment of the images.
  • the method 400 for identifying joints of the target object in the visual environment may be similar to operations performed at block 306 in FIG. 3, according to some embodiments.
  • the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joints of the target object as the object moves in the visual environment of the images.
  • the method 400 may include identifying a set of skeletal joints corresponding to a body of the target object and corresponding to, for example, a torso and limbs of the target moving object.
  • the method 400 includes identifying a joint range of the target object in the visual environment of the images.
  • the method 400 for identifying the joint range of the target object in the visual environment may be similar to operations performed at block 308 in FIG. 3, according to some embodiments.
  • the model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joint range of the target object as the object moves in the visual environment of the images.
  • identifying the joint range may include tracking one or more factors including, but not limited to, a position, rotation, acceleration, and direction of the set of skeletal joints.
  • the method 400 includes comparing the data generated based on processing the images at blocks 402, 404, 406, 408, and 410 to the training data.
  • the training data may be obtained prior to, or during, block 412.
  • the training data may correspond to, for example, training data generated at block 310 in FIG. 3.
  • comparing the dataset to the training dataset further includes obtaining a set of second tracklets from the training dataset, comparing the set of tracklets determined at block 408, 410 with the set of second tracklets to calculate the differences therebetween.
  • the method 400 includes identifying differences between the data generated from tracking the target moving object at blocks 402, 404, 406, 408, and 410, relative the training data. These differences may correspond to differences in a relative position of the object in the visual environment, differences in position of the object’s torso and limbs, proportional differences, rotational differences, acceleration differences, speed differences, other differences, or any combinations thereof. Identifying these differences, or deltas, may include performing one or more calculations to identify these differences.
  • the training data may include therein corresponding data associated with the target object, one or more different objects similar in type to the target object, one or more different types of objects than the target object, or any combinations thereof.
  • determining that one or more of the differences between the data corresponding to the pattern of motion relative the training data exceeding a predetermined threshold value or values is indicative of an anomalous movement sequence by the object.
  • identifying that the differences between the data generated from tracking the target moving object relative the training data exceeds the predetermined threshold value or values may be indicative of physical issues with the object.
  • the method 400 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques of method 400 at a display device.
  • the display may show images from the image sensor including an outline of the target object and a visual representation of the differences between the object in the images as compared to the training data (e.g., differences in limb position, rotation, speed, directions, etc.
  • the display may show images obtained at block 402 overlaid with images from the training data and including data values corresponding to joint positions, speed, rotation, direction, and the like.
  • the display may show a graphical user interface configured to receive inputs from a user to enable performing the image processing techniques in accordance with the present disclosure.
  • displaying image data onto a display device may including displaying a visual indication of differences between the position and direction of the set of skeletal joints of the moving object based on a comparison to the reference dataset.
  • FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments.
  • images 502 captured by an image sensor or sensors such as, for example, image sensor 110 in FIG. 1 , may be displayed on display device 504.
  • the x-axis 506 is horizontal with respect to the orientation of the image sensor
  • the y-axis 508 is vertical with respect to the orientation and position of the image sensor
  • the z-axis 510 is the direction from the image sensor to the tracked object.
  • the axes of movement of detected objects are relative to the image captured by the image sensor or sensors.
  • Each frame of the images may show the target object in a respective position such that the series of images may be translated to movement of the target object or movement of portions of the target object.
  • the object’s movement corresponds to movement of the object’s appendages (limbs) and joints relative the x-axis 506, y- axis 508, and z-axis 510.
  • the display device 504 shows a visual environment 512 determined based on the images captured by the image sensor or sensors, and a target object 514 identified in the images for tracking.
  • the display device 504 may also show, according to some embodiments, a mapping of a frame 516 and corresponding joints 518 associated with the object 514 in the visual environment 512.
  • the joint range and pattern of movement of object 514 may be determined by calculating the translation of the joints 518 along a multi-axial coordinate system such as, for example, x-axis 506, y-axis 508, and z-axis 510 to determine factors such as, for example, position, proportion, rotation, speed, acceleration, direction, other like factors, or any combinations thereof.
  • This data can then be used as training data to train models to perform the image processing techniques in accordance with the present disclosure, or compared to training data to determine differences in movement of the object 514 as compared to object(s) in the training data.
  • the term “between” does not necessarily require being disposed directly next to other elements. Generally, this term means a configuration where something is sandwiched by two or more other things. At the same time, the term “between” can describe something that is directly next to two opposing things.
  • a particular structural component being disposed between two other structural elements can be: disposed directly between both of the two other structural elements such that the particular structural component is in direct contact with both of the two other structural elements; disposed directly next to only one of the two other structural elements such that the particular structural component is in direct contact with only one of the two other structural elements; disposed indirectly next to only one of the two other structural elements such that the particular structural component is not in direct contact with only one of the two other structural elements, and there is another element which juxtaposes the particular structural component and the one of the two other structural elements; disposed indirectly between both of the two other structural elements such that the particular structural component is not in direct contact with both of the two other structural elements, and other features can be disposed therebetween; or any combination(s) thereof.
  • a system comprising: a processor; and a non-transitory computer readable medium having stored therein instructions that are executable by the processor to perform operations for tracking object movement, the operations comprising: establish a visual environment based on a set of images; identify an object in the visual environment; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset; identify differences between the dataset and the training dataset; and display a visual representation of the differences between the dataset and the training dataset at a display device.
  • Clause 2 The system of clause 1 , wherein the processor further performs operations comprising: receive image data captured by an image sensor, the image data corresponding to the set of images of a scene comprising the object. Clause 3. The system according to any of clauses 1-2, wherein the processor further performs operations comprising: comparing the differences to a reference dataset and to predict a physical issue with the object.
  • Clause 4 The system according to any of clauses 1-3, further comprising: an image sensor, wherein the image sensor captures the set of images; and the display device, wherein the display device displays the set of images.
  • Clause 5 The system according to any of clauses 1-4, wherein the display device further comprises a touch-screen interface integral with the display device configured to receive one or more inputs from a user.
  • Clause 6 The system according to any of clauses 1-5, further comprising: a user input device configured to receive one or more inputs from a user.
  • identify the object in the visual environment further comprises: identifying a first region in the visual environment corresponding to pixels representative of the object, wherein the first region is identified based on an input received from a user corresponding to the object for tracking.
  • identify the object in the visual environment further comprises: identifying a second region in the visual environment corresponding to pixels representative of other objects; and identifying a third region in the visual environment corresponding to pixels representative of a background.
  • identifying the set of joints further comprises: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
  • comparing the dataset to the training dataset further comprises: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object, wherein the differences exceeding the predetermined threshold is indicative of physical issues with the object.
  • Clause 11 The system according to any of clauses 1-10, wherein the processor further performs operations comprising: train a model using training data to enable improved tracking of objects based on the set of images, the training data comprising the dataset corresponding to the pattern of motion of the object. Clause 12.
  • a computer-implemented method comprising: establishing a visual environment based on a set of images; identifying an object in the visual environment; identifying a set of joints for the object and associate the set of joints with the object in the visual environment; determining a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; and generating a training dataset comprising data corresponding to the pattern of motion of the object and training a model using the training dataset to enable improved tracking of objects based on captured images.
  • Clause 13 The method of clause 12, the method further comprising: displaying the dataset corresponding to the pattern of motion of the object at a display device to provide a visual representation of the pattern of motion. Clause 14. The method according to any of clauses 12-13, wherein the training dataset further comprises one or more physical issues associated with the patterns of motion.
  • identify the object in the visual environment further comprises: identifying a first region in the images corresponding to pixels representative of the object, wherein the first region is identified based on an input received from a user corresponding to the object for tracking.
  • identifying the set of joints further comprises: identifying, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculating, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
  • a non-transitory computer readable medium having stored therein instructions executable by a processor to perform operations for tracking moving objects in a set of images captured by an image sensor, the operations comprising: receive image data from the image sensor, the image data corresponding to a set of images comprising an object; establish a visual environment based on the set of images; identifying a first region in the visual environment corresponding to pixels representative of the object; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset and identify differences between the dataset and the training dataset; compare the differences to a reference dataset and to predict a physical issue with the object; display a visual representation of the differences between the dataset and the training dataset at a display device; and train a model using training data to enable improved tracking of objects based on the set of images, the training data comprising the dataset corresponding to
  • identifying the set of joints further comprises: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, a movement of the respective set of tracklets to determine the pattern of motion for the object.
  • comparing the dataset to the training dataset further comprises: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object, wherein the differences exceeding the predetermined thresholds is indicative of physical issues with the object.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

Systems, devices, and methods for performing object movement tracking using computer-based models. A system may include a processor, and a memory storing instructions executable by the processor to perform operations including establishing a visual environment based on a set of images, identifying an object in the visual environment, identifying a set of joints for the object and associating the set of joints with the object, determining a pattern of motion based on a movement of the set of joints in the visual environment and generating a dataset corresponding to the pattern of motion, comparing the dataset to a training dataset, identifying differences between the dataset and the training dataset, and displaying a visual representation of the differences between the dataset and the training dataset at a display device. The operations may also include comparing the differences to a reference dataset and to predict a physical issue with the object.

Description

SYSTEMS, DEVICES, AND COMPUTERIZED METHODS FOR TRACKING AND DISPLAYING MOVING OBJECTS ON MOBILE DEVICES
CROSS-REFERENCE TO RELATED APPLICATIONS
[001 ] This application claims priority to U.S. Provisional Application 63/641 ,032 filed on May 1 , 2024, the entire disclosure of which is incorporated herein by reference.
FIELD
[002] The present disclosure relates to systems, devices, and computerized methods for tracking and displaying moving objects on, for example, mobile devices.
BACKGROUND
[003] The tracking of moving objects can be applied towards a broad range of applications, for example, when evaluating an objects response to conditioning based on body and limb positions during a series of movements. When performing such evaluations, a user typically relies on visual inspection and/or manual tracking of body and limb positions in an image, and then comparing the image to other image(s) to identify differences therebetween.
[004] The accuracy, reliability, and functionality of these techniques may be limited due to necessitating a certain amount of human intervention, which may be further limited due to external factors such as, for example, the variability of objects, visual obstructions in images, and limits on viewing angles in the captured images, which can affect the overall performance of performing the motion detection and analysis. In addition, the need for user input or feedback to analyze movement patterns and to make determinations on the significance of variances can render the process highly subjective and the reliability of any determinations made based on such analysis can demonstrate inconsistencies due to internal factors such as, for example, the user’s age, bias, experience, and other such factors. SUMMARY
[005] In some embodiments, a system includes a processor and a non-transitory computer readable medium having stored therein instructions that are executable by the processor to perform operations for tracking object movement, the operations including: establish a visual environment based on a set of images; identify an object in the visual environment; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset; identify differences between the dataset and the training dataset; and display a visual representation of the differences between the dataset and the training dataset at a display device.
[006] In some embodiments, the processor further performs operations including receive image data captured by an image sensor, the image data corresponding to the set of images of a scene including the object.
[007] In some embodiments, the processor further performs operations including comparing the differences to a reference dataset and to predict a physical issue with the object.
[008] In some embodiments, the system further includes an image sensor. In some embodiments, the image sensor captures the set of images. In some embodiments, the system further includes a display device. In some embodiments, the display device displays the set of images.
[009] In some embodiments, the display device further includes a touch-screen interface integral with the display device configured to receive one or more inputs from a user.
[0010] In some embodiments, the system further includes a user input device configured to receive one or more inputs from a user.
[0011] In some embodiments, identifying the object in the visual environment further including identifying a first region in the visual environment corresponding to pixels representative of the object. In some embodiments, the first region is identified based on an input received from a user corresponding to the object for tracking. [0012] In some embodiments, identifying the object in the visual environment further including identifying a second region in the visual environment corresponding to pixels representative of other objects; and identifying a third region in the visual environment corresponding to pixels representative of a background.
[0013] In some embodiments, identifying the set of joints further including: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
[0014] In some embodiments, comparing the dataset to the training dataset further including: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object. In some embodiments, the differences exceeding the predetermined threshold is indicative of physical issues with the object.
[0015] In some embodiments, the processor further performs operations including: train a model using training data to enable improved tracking of objects based on the set of images, the training data including the dataset corresponding to the pattern of motion of the object.
[0016] In some embodiments, a computer-implemented method including: establishing a visual environment based on a set of images; identifying an object in the visual environment; identifying a set of joints for the object and associate the set of joints with the object in the visual environment; determining a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; and generating a training dataset including data corresponding to the pattern of motion of the object and training a model using the training dataset to enable improved tracking of objects based on captured images. [0017] In some embodiments, the method further including displaying the dataset corresponding to the pattern of motion of the object at a display device to provide a visual representation of the pattern of motion.
[0018] In some embodiments, the training dataset further includes one or more physical issues associated with the patterns of motion.
[0019] In some embodiments, identifying the object in the visual environment further including identifying a first region in the images corresponding to pixels representative of the object. In some embodiments, the first region is identified based on an input received from a user corresponding to the object for tracking.
[0020] In some embodiments, identifying the object in the visual environment further including identifying a second region in the images corresponding to pixels representative of other objects, and identifying a third region in the images corresponding to pixels representative of a background.
[0021] In some embodiments, identifying the set of joints further including: identifying, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculating, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
[0022] In some embodiments, a non-transitory computer readable medium having stored therein instructions executable by a processor to perform operations for tracking moving objects in a set of images captured by an image sensor, the operations including: receive image data from the image sensor, the image data corresponding to a set of images including an object; establish a visual environment based on the set of images; identifying a first region in the visual environment corresponding to pixels representative of the object; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset and identify differences between the dataset and the training dataset; compare the differences to a reference dataset and to predict a physical issue with the object; display a visual representation of the differences between the dataset and the training dataset at a display device; and train a model using training data to enable improved tracking of objects based on the set of images, the training data including the dataset corresponding to the pattern of motion of the object.
[0023] In some embodiments, identifying the set of joints further including: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, a movement of the respective set of tracklets to determine the pattern of motion for the object.
[0024] In some embodiments, comparing the dataset to the training dataset further including: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object. In some embodiments, the differences exceeding the predetermined thresholds is indicative of physical issues with the object.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] Some embodiments of the disclosure are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the embodiments shown are by way of example and for purposes of illustrative discussion of embodiments of the disclosure. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the disclosure may be practiced.
[0026] FIG. 1 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
[0027] FIG. 2 is a schematic diagram of a non-limiting example of a system, according to some embodiments.
[0028] FIG. 3 is an exemplary flow diagram of a method for training models for performing image processing techniques, according to some embodiments. [0029] FIG. 4 is an exemplary flow diagram of a method for performing the image processing techniques, according to some embodiments.
[0030] FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments.
DETAILED DESCRIPTION
[0031] Various embodiments of the present disclosure are directed to tracking of moving objects in images in near real-time using computer-based models including, for example, image processing models. The advancements described herein may be provided in computing devices such as, for example, in mobile computing devices. The computing device may include a processor, a non-transitory computer readable media device such as, for example, a memory or hard-disk drive, and one or more sensor devices for capturing images such as, for example, a camera. According to some embodiments, the computing devices can include instructions stored in the memory such as, for example, software applications executable by the processor to perform the various embodiments described herein.
[0032] According to some embodiments, the models utilized in these systems and computing devices may include, for example, machine vision models that can replace vision/recognition tasks traditionally performed by humans or tasks that traditionally needed a certain amount of human input or feedback to process the image processing techniques in real-time. The image processing techniques that may be applied by these models can include, according to some embodiments, processing images or a series of images to identify and track an object or objects in the images. In some embodiments, the image processing techniques applied by the models can include processing the images to identify and track the movement of objects in a visual environment generated based on the images, to identify and track the movement of portions of these objects (e.g., arms, legs, wings, etc.), and to make predictions of a condition of the objects based on the analysis. According to some embodiments, the moving objects captured in these images may be in fields such as, for example, sports, physical therapy, physical conditioning, medical diagnostics, cell biology, astronomy, ornithology, equestrian sports, and the like. Such objects may include, for example, humans, horses, birds, reptiles, and other like objects. According to some embodiments, the objects captured in the images may be at a cellular level. According to some embodiments, the objects may include astronomical objects, vehicles, automobiles, motorcycles, bicycles, airplanes, drones, and other like objects.
[0033] Systems including these models can apply the image processing techniques to track object(s) in the captured images or video such as, for example, an athlete’s movements. According to some embodiments, the images may be captured by an external device such as, for example, a handheld DSLR camera, and a computing device may obtain the images from the external device to be processed using the models described herein. The image processing techniques may be used to compare images of an object’s movement either between images or between sets of images, such as between images captured during different points in time to make predictions associated with the object based on the images. For example, the system may process images capturing a person currently undergoing physical therapy performing a sequence of movements to make certain predictions of the type of injury or to provide recommendations for therapeutic exercises that can be performed to help correct for any identified issues.
[0034] Various embodiments of the present disclosure relate to systems, devices, computer-implemented methods, and non-transitory computer readable media for performing the image processing techniques for tracking moving objects in images. According to various embodiments, the techniques can include determining data corresponding to differences in the object’s movement based on a comparison of the images to reference data, and visually displaying the differences between the object’s movement in the images and the reference data. For example, a mobile computing device (e.g., mobile cellular phone) having a camera and a display may be used to capture images of a horse running on a track and the mobile computing device may process the images to identify differences between the horse’s movement in the images compared to historical reference data and may visually display the differences on the display of the mobile computing device in near realtime. According to some embodiments, the image processing techniques for tracking of moving objects in images may include calculating proportional differences, rotational differences, acceleration differences, speed differences, other like data, or any combinations thereof, between the moving object in the images and the reference data. For example, a user may utilize these image processing techniques to track an athlete’s swing sequence in images captured during different time periods and view differences in the athlete’s swing sequence by displaying the images captured during one time period relative images captured during another time period.
[0035] According to some embodiments, the image processing techniques can include determining the object is demonstrating movements associated with certain physical issues based on comparing the object’s tracked movements from the captured images to reference data and determining data corresponding to differences between the object’s movement in the captured images and the reference data. According to some embodiments, the reference data may include data of similar movements by the same athlete. In some embodiments, the reference data may include historical image data of the same object that is captured in the images being processed. In other embodiments, the reference data may include historical image data of objects that are similar in type to the object captured in the images being processed. Based on the difference data generated based on the object’s tracked movements and based on data corresponding to similar movements of objects from image data in the reference dataset, a prediction associated with the object can be made. For example, the system may provide a prediction, based on the athlete’s tracked movements in the images and based on the processing of those images, that the athlete displays indications of decreased mobility in their leg due to, for example, a pulled hamstring muscle, or some other type of physical issue that affects the athlete’s movement.
[0036] In various embodiments, the image processing techniques for tracking of moving objects in images may include generating and providing display data to a display device, the display data corresponding to visual representations highlighting variances in the object’s movement compared to reference data such as, for example, in historical image data determined based on comparing differences between features extracted from the captured images to the reference dataset. In some embodiments, the reference data may correspond to pre-captured image data of different object types including, for example, data of the certain moving object being analyzed in the images. In this regard, the system may, based on the data corresponding to differences determined between the image data and the reference data, generate and output display data corresponding to the visual representations of real-time highlighting in the captured image or images of the moving object, or may display such data on a display in electronic communication with the system. The highlighting may, according to some embodiments, be indicative of one or more differences including, but not limited to, joint limit, joint speed, joint acceleration, joint distance, object speed, object acceleration, object distance, other indications, or any combinations thereof.
[0037] Among those benefits and improvements that have been disclosed, other objects and advantages of this disclosure will become apparent from the following description taken in conjunction with the accompanying figures. Detailed embodiments of the present disclosure are disclosed herein; however, it is to be understood that the disclosed embodiments are merely illustrative of the disclosure that may be embodied in various forms. In addition, each of the examples given regarding the various embodiments of the disclosure which are intended to be illustrative, and not restrictive.
[0038] FIG. 1 is a schematic diagram illustrating a non-limiting example of a system 100, according to some embodiments. System 100 may be a computing device such as, for example, a personal computing device associated with a user. In some embodiments, system 100 may be a mobile computing device such as, for example, a smart cellular telephone, tablet, laptop, personal digital assistant (PDA), augmented reality (AR) device such as a headset, or other like devices.
[0039] System 100 may include side 102 and side 104 opposite the side 102. In FIG. 1 , both the sides 102, 104 of system 100 are shown for simplicity purposes. System 100 may include one or more components therein including processor 106, memory 108, image sensor 110, and display 112. The one or more components of system 100 may be located in housing 114. The image sensor 110 may be located on a side of housing 114. In FIG. 1 , for example, image sensor 110 is shown located on side 104. Although not shown in the figures, in some embodiments, system 100 may include the image sensor on side 102, image sensor on side 104, or image sensors on both side 102 and side 104. In other embodiments, such as shown in FIG. 2, the system 100 may be in electronic communication with an external image sensor. The display 112 may also be located on a side of housing 114. For example, FIG. 1 shows display 112 located on side 102 of housing 114. In some embodiments, system 100 may include display 112 on side 102, side 104, or on both the sides 102, 104.
[0040] The processor 106 may be a microprocessor, such as that included in a device such as a smart phone. The processor is configured to analyze images from the image sensor 110 to identify moving objects in images including slow-moving objects and fast-moving objects, tracks the identified objects movement between images and determines differences in the objects movement compared to reference data, and then displays image data corresponding to visual representations of the differences. For example, the display may show images from the image sensor with alert frames overlaid on an identified portion or portions of the moving object (e.g., limbs) indicative of differences in the object’s movement compared to reference data. In some embodiments, the processor may be additionally configured to crop images or to mark certain areas of images and exclude those areas from analysis. For example, objects other than the target object may be excluded from analysis. In some embodiments, the processor may receive input from a user input device, for example touch-screen input from the display 112 and based on the input, determines areas to crop out or exclude from analysis. In some embodiments, the types of alerts, including audio components and the shape, color, or effects such as flashing of alert boxes on the display 112 may be selected.
[0041] Memory 108 is a non-transitory computer readable data storage device. The memory 108 can be, for example, a flash memory, magnetic media, optical media, random access memory, etc. The memory 108 is configured to store data corresponding to images captured by the image sensor 110. In some embodiments, the images captured by the image sensor 110 may include additional data such as timestamps, or may be modified by the processor, for example cropping the image files or marking zones of the image files as excluded from analysis by the processor 106.
[0042] The image sensor 110 may be, for example, a digital camera. The image sensor 110 may capture a series of images over a time period. The image sensor 110 may capture video corresponding to a series of images over the time period. The image sensor 110 may also add additional data to the captured images, such as metadata, for example timestamps. In some embodiments, the image sensor 110 may be an infrared camera or a thermal imaging sensor. In an embodiment, the image sensor 110 is a digital sensor. In an embodiment, the image sensor 110 includes an image stabilization feature. The frame rate and/or the resolution of the image sensor 110 affects a sensory range of the system. A greater frame rate may increase the sensory range of the system. A greater resolution may increase the sensory range of the system.
[0043] Display 112 may be a display device or component which includes light emitting diodes (LED), organic light emitting diodes (OLED), liquid crystal display (LCD), and other like types of display devices. For example, the display 112 can be a component of a smart phone or a tablet device. Display 112 receives processed image data from the processor 106 and displays the processed image data. The display 112 may include a user input feature, for example where the display is a touchscreen device. The input feature may be used, for example, to define regions of the images to exclude from analysis, or to select options and set parameters for those options such as distance and velocity thresholds for alarms or time windows for performing tracking operations.
[0044] Housing 114 may be a metal and/or plastic casing covering the processor 106 and memory 108, and with at least one image sensor 110 disposed on side 104 and the display 112 disposed on side 102. The housing 114, image sensor 110, memory 108, and processor 106 may be, for example, a computing device including a smart phone device. [0045] FIG. 2 is a schematic diagram illustrating a non-limiting example of a system 200, according to some embodiments. System 200 includes processor 202, memory 204, and multiple image sensors 206, which may be in electronic communicable connection with each other. The processor 202 and the memory 204 may be located separately from the multiple image sensors 206. System 200 may include a display device for displaying image data. In some embodiments, the display may be located separately from processor 202 and memory 204.
[0046] System 200 may be in electronic communicable connection with one or more other computing devices such as, for example, system 100 in FIG. 1 , and the other computing device may display image data on its display. The processor 202, memory 204, and multiple image sensors 206 may be in electronic communicable connection through one or more different types of electronic connections. In some embodiments, one or all of the processor 202, memory 204, and multiple image sensors 206 may also be in electronic communicable connection with other computing devices such as system 100 through one or more different types of electronic connections. The components of system 200 may be in electronic communicable connection through wired connections and/or wireless connections including, but not limited to, Wi-Fi, Bluetooth, Bluetooth Low Energy (BLE), 5G, 4G, LTE, CDMA, other wireless communication protocols, or any combinations thereof. For example, the wired connections can include USB, ethernet, and the like.
[0047] In some embodiments, the multiple image sensors 206 may include a fixed camera. The camera can be mounted on a vehicle (e.g., land vehicle, water vehicle, or air vehicle). For example, the camera can be mounted to a wired suspension assembly configured to move the camera along one or more axis to capture images of persons moving on a sporting field. In another example, the camera can be mounted on an aerial drone device. In some embodiments, the camera may be portable and located in a housing that may be fixed to another object, such as a static object, for example a tree, or a movable object, for example a helmet. In another example, the camera may be located in a housing that may be hand-held. In some embodiments, the multiple image sensors 16 may be part of a single three- dimensional (3-D) camera such as a stereoscopic camera. In embodiments with a 3-D camera or multiple image sensors, the distance to an object or its approximate size may be determined based on the images from each of the image sensors.
[0048] In some embodiments, there may be an additional user input tool, for example a keyboard and/or mouse in communication with the processor 202, or integrated into components of the system 200, such as a display device similar to display 112 having touch-screen functionality. These user input tools may be used for certain options, such as selecting rules for alarms or notifications, defining areas to exclude from analysis, or activating or deactivating the tracking functionality for particular periods of time, for example disabling object movement tracking and analysis between different intervals. The display may be a two-dimensional display screen such as an LED, LCD, or OLED display. In some embodiments, the display may be a VR device such as a headset, an AR device such as a head-mounted display with a translucent or transparent screen.
[0049] FIG. 3 is a flow diagram of a method 300 for training models for performing image processing techniques, according to some embodiments. At block 302, the method 300 includes establishing a visual environment in images. Establishing the visual environment may include identifying static or non-moving areas that are not tracked and identifying moving objects in the images for tracking. Establishing the visual environment may include classifying objects in the images.
[0050] The visual environment may be the vertical and horizontal area captured by an image sensor such as, for example, image sensor 110 in FIG. 1 . That is, the visual environment may represent the visual range of the image sensor. The visual environment may correspond to the characteristics of images provided to a user via a display such as, for example, display 112 in FIG. 1. The images provided to the user via the display may include, for example, the field of view and the resolution of the displayed image.
[0051] Identified static or non-moving areas (e.g., no movement areas) may be determined by a lack of change in a portion of the visual environment (e.g., pixel or pixels) across multiple frames of the visual environment captured by the image sensor. The aspects of the static or non-moving areas that do not change may include the color, hue, shade or brightness of those areas. Static or non-moving areas may include pixels representative of the ground and fixed features such as, for example, trees, structural members, and the like, which are not moving relative to the image sensor. For example, the image sensors may be at an elevated fixed position and may capture images of the moving objects and the ground during an application such as during a horse race. Static or non-moving areas do not trigger any alerts or tracking as will be further described herein.
[0052] At block 304, the method 300 includes identifying objects in the visual environment of the images. Identifying the objects may include identifying regions in the image or images corresponding to the background and one or more moving objects in the image. The image may include one or more objects moving in the image or images. The objects may include, for example, first object, second object, and through nth objects captured in an image or series of images during a certain time period. In some embodiments, one or more objects may be captured in images by an image sensor and displayed on a display device such that a target object for tracking may be identified based on one or more inputs received at an input device. For example, the inputs may correspond to a user selection of the target object. In another example, the inputs may correspond to a region of the image including therein at least a portion of the image and the one or more techniques herein can analyze the selected region and identify the target object.
[0053] Identifying the objects in the images may include identifying regions in the image or series of images corresponding to pixels representative of objects, or pixels representative of a target object among one or more objects. For example, a boundary may be identified in the images defining the object, the boundary including the object’s torso and any appendages. Identifying the objects may include classifying objects. For example, the images may capture more than one moving object in the image. In this regard, one or more objects and/or one or more different types of objects including a target object may be identified in the images for tracking. For example, the tracking may be performed on a particular object, e.g., athlete, in a group of other similar objects, e.g., other athletes, in the images.
[0054] In some embodiments, user input may be used to define regions which are excluded from the analysis. Excluded regions may include, for example, areas unlikely to contain an object of interest, for example the ground, areas with an excess of distractors such as a crowd in the background, or areas which are not useful to the user such as outside a playing surface. The user input for defining the region may be interaction with a display or user interface showing the view of the image sensor, with the input being a touch-screen input such as drawing a region to exclude or using two or more fingers to define the region to be excluded, or selection using a cursor such as one controlled by a mouse, track-pad, joy-stick or other such control interfacing with the processor and the display. The excluded regions may be removed from the portions of the image which are analyzed, may be treated as non-moving or static areas (e.g., no movement areas), or may be treated as distractors. For example, the excluded regions may be cropped out of the image sensor images prior to their processing in establishing the visual environment.
[0055] At block 306, the method 300 includes identifying joints for the target object and associated the joints with the moving object in the visual environment. Identifying the joints may include identifying, based on the region of pixels identified as representing the target moving object, tracklets corresponding to appendages and joints of the object identified based on rendering a frame of the object. For example, joints corresponding to elbows, knees, ankles, and wrists associated with a person being tracked in the images. Referring to FIG. 3, a target object may include, for example, a torso, a first appendage having a first joint associated therewith, a second appendage having a first and second joint associated therewith, and through a nth appendage having a nth joint associated therewith.
[0056] Identifying the set of joints may include identifying, for each image, a set of tracklets based on a frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object or a segment extending between respective joints at the ends of the tracklet.
[0057] In some embodiments, identifying the joints may include determining a 2-D frame of the moving object based on the 2-D images, and rendering a 3-D frame of the object based on applying one or more algorithms to the 2-D frame to map the object in a 3-D coordinate system, the rendering process for generating the 3-D model being capable of compensating for occlusion due to using 2-D images as input.
[0058] At block 308, the method 300 includes determining a pattern of motion of the target object based on a movement of the joints in the visual environment and generating an output dataset corresponding to the pattern of motion of the object. In some embodiments, the pattern of motion may include a joint range of the moving object. Determining the joint range may include tracking a position of the objects joints and determining a pattern of motion of the target object based on determining rotational limits, acceleration, and speed of the joints. This pattern data including the relative joint positions, rotational limits, acceleration, and speed of the joints may be stored in the memory. Determining the joint range may also include, based on the pattern of motion of the joints, determining an overall speed relative the real-life environment, acceleration data, and direction of the moving object. In some embodiments, determining the pattern of motion for the object ay include calculating, between each image of the set of images, movement of the respective set of tracklets between each image frame to determine the pattern of motion for the object.
[0059] Determining the joint range may include identifying areas or regions associated with the object and its appendages in an environment such as, for example, the visual environment, multi-dimensional coordinate system, or some other environment, where a rate of change in one or more aspects is greater than zero or a minimum value. In some embodiments, the rate of change may be determined based on exceeding a threshold value. The aspects which may change in the pixels of the object may include the color, hue, shade, and/or brightness of that area or object. In an embodiment, a determination is made that an object is moving based on whether one or more threshold values for the rate of change of the aspects of that area or object are satisfied. The size of the area of the visual environment where the aspects are changing may also be used to determine whether an object is moving and the rate at which the object or its appendages and joints are moving. [0060] The threshold values for rate of change and/or the area over which aspects change may, in an embodiment, be one or more predetermined values. The predetermined values may be selected, for example, based on the use case in which this method is applied, for example using one set of threshold values for an embodiment based on tracking a racehorse performing a sequence of movements, while an embodiment for use in monitoring flight patterns of birds has a different set of predetermined threshold values. In an embodiment, the threshold value is selected (set by) a user via a user interface (e.g., a graphical user interface displayed on a display). These moving areas may include large areas including therein one or more objects, such as in some examples, birds flying in a flock or stars moving across the sky. These objects may, in some embodiments, be user defined, for example through selection of regions via a user interface such as a touch screen. For objects, there may be an initial alert such as a presentation of a box around the object.
[0061] The moving objects are tracked continuously. As each image is processed, the object corresponding to the target object is identified and tracked as the target object moves in the image or in each frame of a series of images. When monitored, objects or areas corresponding to the target object may be identified, and in some embodiments, a tracking symbol such as an alert box may be displayed over the target object.
[0062] At block 310, the method 300 includes recording training data corresponding to movement data generated based on tracking the target object in the images. The movement data may include, but is not limited to, joint position data, pattern data, rotational limits, accelerations, joint speed, joint direction, overall speed, overall accelerations, overall direction, metadata, other movement characteristics, or any combinations thereof. The data may include other types of data including object type classifications, species, gender, classifications of inanimate or non-moving objects, other definitions, or any combinations thereof.
[0063] The training data may be utilized by the one or more techniques to enable performing the object tracking using one or more models as described herein. One or more data points generated based on the tracking may also be combined with the training data to iteratively update the reference data to provide improved object tracking functionality by the one or more techniques and one or more models. In this regard, the models may be trained using the training data and updating the training data iteratively updates the models to provide improved functionality such as improved object classification, tracking, joint identification, pattern determination, and determination of differences between new images and the training data, as will be further described herein. According to some embodiments, any of the training data is used to train a machine-learning device, system, or both to produce an improved device, system, or both. According to some embodiments, any of the training data is used to train an artificial intelligence (Al) device, system, or both to produce an improved device, system, or both. According to some embodiments, the training data is not used to train an Al device, system, or both.
[0064] At any of blocks 302, 304, 306, 308, and 310, the method 300 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques at a display device. For example, the display may show the images from the image sensor and including an outline of the target object. In another example, the display may show the images and a frame associated with the target object, and data values corresponding to joint positions, speed, rotation, direction, and the like.
[0065] FIG. 4 is a flow diagram of a method 400 for performing the image processing techniques, according to some embodiments. At block 402, the method 400 includes obtaining an image or obtaining a series of images from an image sensor. The image sensor may correspond to, for example, image sensor 110 in FIG. 1 . In some embodiments, the image sensor may correspond to at least one of the multiple image sensors 206 in FIG. 2. In some embodiments, obtaining the image may include receiving a first dataset comprising image data corresponding to a plurality of images from at least one image sensor.
[0066] At block 402, the images may be captured during a period of time occurring after the period of time the images were processed at block 302. That is, the images and the processing of the images to generate the training data, as shown in FIG. 3, may correspond to historical image data captured and/or processed during a period of time occurring before the time period the images at block 402 were captured.
[0067] At block 404, the method 400 includes analyzing the image data to establishing a visual environment based on the images. The method 400 for establishing the visual environment may be similar to operations performed at block 302 in FIG. 3, according to some embodiments. A model such as, for example, a computer vision model trained on the training data may be utilized to analyze the images and determine the visual environment based on the images.
[0068] At block 406, the method 400 includes identifying objects in the visual environment of the images, the objects including a target moving object. In some embodiments, the target object may be identified from a plurality of different objects or different types of objects. The method 400 for identifying the objects in the visual environment may be similar to the operations performed at block 304 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identifying the objects in the visual environment.
[0069] In some embodiments, identifying the objects includes classifying the moving object or objects based on a comparison of the target objects to the training dataset. In this regard, the method 400 may include obtaining the training dataset such as, for example, from block 310 in FIG. 3, to perform the object classification.
[0070] At block 408, the method 400 includes identifying joints of the target object in the visual environment of the images. The method 400 for identifying joints of the target object in the visual environment may be similar to operations performed at block 306 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joints of the target object as the object moves in the visual environment of the images. In some embodiments, the method 400 may include identifying a set of skeletal joints corresponding to a body of the target object and corresponding to, for example, a torso and limbs of the target moving object. [0071 ] At block 410, the method 400 includes identifying a joint range of the target object in the visual environment of the images. The method 400 for identifying the joint range of the target object in the visual environment may be similar to operations performed at block 308 in FIG. 3, according to some embodiments. The model such as, for example, the computer vision model trained on the training data may be utilized to analyze the images and identify the joint range of the target object as the object moves in the visual environment of the images. In some embodiments, identifying the joint range may include tracking one or more factors including, but not limited to, a position, rotation, acceleration, and direction of the set of skeletal joints.
[0072] At block 412, the method 400 includes comparing the data generated based on processing the images at blocks 402, 404, 406, 408, and 410 to the training data. In this regard, the training data may be obtained prior to, or during, block 412. The training data may correspond to, for example, training data generated at block 310 in FIG. 3.
[0073] In some embodiments, comparing the dataset to the training dataset further includes obtaining a set of second tracklets from the training dataset, comparing the set of tracklets determined at block 408, 410 with the set of second tracklets to calculate the differences therebetween.
[0074] At block 414, the method 400 includes identifying differences between the data generated from tracking the target moving object at blocks 402, 404, 406, 408, and 410, relative the training data. These differences may correspond to differences in a relative position of the object in the visual environment, differences in position of the object’s torso and limbs, proportional differences, rotational differences, acceleration differences, speed differences, other differences, or any combinations thereof. Identifying these differences, or deltas, may include performing one or more calculations to identify these differences. In this regard, the training data may include therein corresponding data associated with the target object, one or more different objects similar in type to the target object, one or more different types of objects than the target object, or any combinations thereof. In some embodiments, determining that one or more of the differences between the data corresponding to the pattern of motion relative the training data exceeding a predetermined threshold value or values is indicative of an anomalous movement sequence by the object. In some embodiments, identifying that the differences between the data generated from tracking the target moving object relative the training data exceeds the predetermined threshold value or values may be indicative of physical issues with the object.
[0075] At any of blocks 402, 404, 406, 408, 410, 412, 414, and 416, the method 400 may include displaying images from the image sensor and/or any of the data generated based on performing the image processing techniques of method 400 at a display device. For example, the display may show images from the image sensor including an outline of the target object and a visual representation of the differences between the object in the images as compared to the training data (e.g., differences in limb position, rotation, speed, directions, etc. In another example, the display may show images obtained at block 402 overlaid with images from the training data and including data values corresponding to joint positions, speed, rotation, direction, and the like. In yet another example, the display may show a graphical user interface configured to receive inputs from a user to enable performing the image processing techniques in accordance with the present disclosure. In some embodiments, displaying image data onto a display device may including displaying a visual indication of differences between the position and direction of the set of skeletal joints of the moving object based on a comparison to the reference dataset.
[0076] FIG. 5 is a graphical diagram illustrating a non-limiting example of a type of movement which may be detected and tracked, according to some embodiments. In FIG. 5, images 502 captured by an image sensor or sensors such as, for example, image sensor 110 in FIG. 1 , may be displayed on display device 504. In these embodiments, the x-axis 506 is horizontal with respect to the orientation of the image sensor, the y-axis 508 is vertical with respect to the orientation and position of the image sensor, and the z-axis 510 is the direction from the image sensor to the tracked object. The axes of movement of detected objects are relative to the image captured by the image sensor or sensors. Each frame of the images may show the target object in a respective position such that the series of images may be translated to movement of the target object or movement of portions of the target object. In FIG. 5, the object’s movement corresponds to movement of the object’s appendages (limbs) and joints relative the x-axis 506, y- axis 508, and z-axis 510.
[0077] The display device 504 shows a visual environment 512 determined based on the images captured by the image sensor or sensors, and a target object 514 identified in the images for tracking. The display device 504 may also show, according to some embodiments, a mapping of a frame 516 and corresponding joints 518 associated with the object 514 in the visual environment 512. The joint range and pattern of movement of object 514 may be determined by calculating the translation of the joints 518 along a multi-axial coordinate system such as, for example, x-axis 506, y-axis 508, and z-axis 510 to determine factors such as, for example, position, proportion, rotation, speed, acceleration, direction, other like factors, or any combinations thereof. This data can then be used as training data to train models to perform the image processing techniques in accordance with the present disclosure, or compared to training data to determine differences in movement of the object 514 as compared to object(s) in the training data.
[0078] All prior patents and publications referenced herein are incorporated by reference in their entireties.
[0079] Throughout the specification and claims, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise. The phrases "in one embodiment," “in an embodiment,” and "in some embodiments" as used herein do not necessarily refer to the same embodiment(s), though it may. Furthermore, the phrases "in another embodiment" and "in some other embodiments" as used herein do not necessarily refer to a different embodiment, although it may. All embodiments of the disclosure are intended to be combinable without departing from the scope or spirit of the disclosure.
[0080] As used herein, the term "based on" is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise. In addition, throughout the specification, the meaning of "a," "an," and "the" include plural references. The meaning of "in" includes "in" and "on."
[0081] A s used herein, the term “between” does not necessarily require being disposed directly next to other elements. Generally, this term means a configuration where something is sandwiched by two or more other things. At the same time, the term “between” can describe something that is directly next to two opposing things. Accordingly, in any one or more of the embodiments disclosed herein, a particular structural component being disposed between two other structural elements can be: disposed directly between both of the two other structural elements such that the particular structural component is in direct contact with both of the two other structural elements; disposed directly next to only one of the two other structural elements such that the particular structural component is in direct contact with only one of the two other structural elements; disposed indirectly next to only one of the two other structural elements such that the particular structural component is not in direct contact with only one of the two other structural elements, and there is another element which juxtaposes the particular structural component and the one of the two other structural elements; disposed indirectly between both of the two other structural elements such that the particular structural component is not in direct contact with both of the two other structural elements, and other features can be disposed therebetween; or any combination(s) thereof.
[0082] The following Clauses provide exemplary embodiments according to the disclosure herein. Any feature(s) in any of the Clause(s) can be combined with any other Clause(s).
Clause 1. A system comprising: a processor; and a non-transitory computer readable medium having stored therein instructions that are executable by the processor to perform operations for tracking object movement, the operations comprising: establish a visual environment based on a set of images; identify an object in the visual environment; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset; identify differences between the dataset and the training dataset; and display a visual representation of the differences between the dataset and the training dataset at a display device.
Clause 2. The system of clause 1 , wherein the processor further performs operations comprising: receive image data captured by an image sensor, the image data corresponding to the set of images of a scene comprising the object. Clause 3. The system according to any of clauses 1-2, wherein the processor further performs operations comprising: comparing the differences to a reference dataset and to predict a physical issue with the object.
Clause 4. The system according to any of clauses 1-3, further comprising: an image sensor, wherein the image sensor captures the set of images; and the display device, wherein the display device displays the set of images.
Clause 5. The system according to any of clauses 1-4, wherein the display device further comprises a touch-screen interface integral with the display device configured to receive one or more inputs from a user.
Clause 6. The system according to any of clauses 1-5, further comprising: a user input device configured to receive one or more inputs from a user.
Clause 7. The system according to any of clauses 1-6, wherein identify the object in the visual environment further comprises: identifying a first region in the visual environment corresponding to pixels representative of the object, wherein the first region is identified based on an input received from a user corresponding to the object for tracking.
Clause 8. The system according to any of clauses 1-7, wherein identify the object in the visual environment further comprises: identifying a second region in the visual environment corresponding to pixels representative of other objects; and identifying a third region in the visual environment corresponding to pixels representative of a background.
Clause 9. The system according to any of clauses 1-8, wherein identifying the set of joints further comprises: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
Clause 10. The system according to any of clauses 1-9, wherein comparing the dataset to the training dataset further comprises: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object, wherein the differences exceeding the predetermined threshold is indicative of physical issues with the object.
Clause 11 . The system according to any of clauses 1-10, wherein the processor further performs operations comprising: train a model using training data to enable improved tracking of objects based on the set of images, the training data comprising the dataset corresponding to the pattern of motion of the object. Clause 12. A computer-implemented method comprising: establishing a visual environment based on a set of images; identifying an object in the visual environment; identifying a set of joints for the object and associate the set of joints with the object in the visual environment; determining a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; and generating a training dataset comprising data corresponding to the pattern of motion of the object and training a model using the training dataset to enable improved tracking of objects based on captured images.
Clause 13. The method of clause 12, the method further comprising: displaying the dataset corresponding to the pattern of motion of the object at a display device to provide a visual representation of the pattern of motion. Clause 14. The method according to any of clauses 12-13, wherein the training dataset further comprises one or more physical issues associated with the patterns of motion.
Clause 15. The method according to any of clauses 12-14, wherein identify the object in the visual environment further comprises: identifying a first region in the images corresponding to pixels representative of the object, wherein the first region is identified based on an input received from a user corresponding to the object for tracking.
Clause 16. The method according to any of clauses 12-15, wherein identify the object in the visual environment further comprises: identifying a second region in the images corresponding to pixels representative of other objects; and identifying a third region in the images corresponding to pixels representative of a background.
Clause 17. The method according to any of clauses 12-16, wherein identifying the set of joints further comprises: identifying, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculating, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
Clause 18. A non-transitory computer readable medium having stored therein instructions executable by a processor to perform operations for tracking moving objects in a set of images captured by an image sensor, the operations comprising: receive image data from the image sensor, the image data corresponding to a set of images comprising an object; establish a visual environment based on the set of images; identifying a first region in the visual environment corresponding to pixels representative of the object; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset and identify differences between the dataset and the training dataset; compare the differences to a reference dataset and to predict a physical issue with the object; display a visual representation of the differences between the dataset and the training dataset at a display device; and train a model using training data to enable improved tracking of objects based on the set of images, the training data comprising the dataset corresponding to the pattern of motion of the object. Clause 19. The computer readable medium of clause 18, wherein identifying the set of joints further comprises: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, a movement of the respective set of tracklets to determine the pattern of motion for the object.
Clause 20. The computer readable medium according to any of clauses 18-19, wherein comparing the dataset to the training dataset further comprises: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object, wherein the differences exceeding the predetermined thresholds is indicative of physical issues with the object.

Claims

CLAIMS What is claimed is:
1. A system comprising: a processor; and a non-transitory computer readable medium having stored therein instructions that are executable by the processor to perform operations for tracking object movement, the operations comprising: establish a visual environment based on a set of images; identify an object in the visual environment; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset; identify differences between the dataset and the training dataset; and display a visual representation of the differences between the dataset and the training dataset at a display device.
2. The system of claim 1 , wherein the processor further performs operations comprising: receive image data captured by an image sensor, the image data corresponding to the set of images of a scene comprising the object.
3. The system of claim 1 , wherein the processor further performs operations comprising: comparing the differences to a reference dataset and to predict a physical issue with the object.
4. The system of claim 1 , further comprising: an image sensor, wherein the image sensor captures the set of images; and the display device, wherein the display device displays the set of images.
5. The system of claim 4, wherein the display device further comprises a touchscreen interface integral with the display device configured to receive one or more inputs from a user.
6. The system of claim 1 , further comprising: a user input device configured to receive one or more inputs from a user.
7. The system of claim 1 , wherein identify the object in the visual environment further comprises: identifying a first region in the visual environment corresponding to pixels representative of the object, wherein the first region is identified based on an input received from a user corresponding to the object for tracking.
8. The system of claim 7, wherein identify the object in the visual environment further comprises: identifying a second region in the visual environment corresponding to pixels representative of other objects; and identifying a third region in the visual environment corresponding to pixels representative of a background.
9. The system of claim 1 , wherein identifying the set of joints further comprises: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
10. The system of claim 1 , wherein comparing the dataset to the training dataset further comprises: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object, wherein the differences exceeding the predetermined threshold is indicative of physical issues with the object.
11 . The system of claim 1 , wherein the processor further performs operations comprising: train a model using training data to enable improved tracking of objects based on the set of images, the training data comprising the dataset corresponding to the pattern of motion of the object.
12. A computer-implemented method comprising: establishing a visual environment based on a set of images; identifying an object in the visual environment; identifying a set of joints for the object and associate the set of joints with the object in the visual environment; determining a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; and generating a training dataset comprising data corresponding to the pattern of motion of the object and training a model using the training dataset to enable improved tracking of objects based on captured images.
13. The method of claim 12, the method further comprising: displaying the dataset corresponding to the pattern of motion of the object at a display device to provide a visual representation of the pattern of motion.
14. The method of claim 12, wherein the training dataset further comprises one or more physical issues associated with the patterns of motion.
15. The method of claim 12, wherein identify the object in the visual environment further comprises: identifying a first region in the images corresponding to pixels representative of the object, wherein the first region is identified based on an input received from a user corresponding to the object for tracking.
16. The method of claim 12, wherein identify the object in the visual environment further comprises: identifying a second region in the images corresponding to pixels representative of other objects; and identifying a third region in the images corresponding to pixels representative of a background.
17. The method of claim 12, wherein identifying the set of joints further comprises: identifying, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculating, between each image, movement of the respective set of tracklets to determine the pattern of motion for the object.
18. A non-transitory computer readable medium having stored therein instructions executable by a processor to perform operations for tracking moving objects in a set of images captured by an image sensor, the operations comprising: receive image data from the image sensor, the image data corresponding to a set of images comprising an object; establish a visual environment based on the set of images; identifying a first region in the visual environment corresponding to pixels representative of the object; identify a set of joints for the object and associate the set of joints with the object in the visual environment; determine a pattern of motion based on a movement of the set of joints in the visual environment and generate a dataset corresponding to the pattern of motion of the object; compare the dataset to a training dataset and identify differences between the dataset and the training dataset; compare the differences to a reference dataset and to predict a physical issue with the object; display a visual representation of the differences between the dataset and the training dataset at a display device; and train a model using training data to enable improved tracking of objects based on the set of images, the training data comprising the dataset corresponding to the pattern of motion of the object.
19. The computer readable medium of claim 18, wherein identifying the set of joints further comprises: identify, for each image, a set of tracklets based on a skeletal frame of the object, each tracklet in the set of tracklets being associated with a respective limb of the object; and calculate, between each image, a movement of the respective set of tracklets to determine the pattern of motion for the object.
20. The computer readable medium of claim 19, wherein comparing the dataset to the training dataset further comprises: obtain a set of second tracklets from the training dataset; compare the set of tracklets with the set of second tracklets to calculate the differences therebetween; and determine one or more of the differences exceed a predetermined threshold indicative of an anomalous movement sequence by the object, wherein the differences exceeding the predetermined thresholds is indicative of physical issues with the object.
PCT/US2025/026991 2024-05-01 2025-04-30 Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices Pending WO2025231077A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463641032P 2024-05-01 2024-05-01
US63/641,032 2024-05-01

Publications (1)

Publication Number Publication Date
WO2025231077A1 true WO2025231077A1 (en) 2025-11-06

Family

ID=97562173

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/026991 Pending WO2025231077A1 (en) 2024-05-01 2025-04-30 Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices

Country Status (1)

Country Link
WO (1) WO2025231077A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200272148A1 (en) * 2019-02-21 2020-08-27 Zoox, Inc. Motion prediction based on appearance
US11036989B1 (en) * 2019-12-11 2021-06-15 Snap Inc. Skeletal tracking using previous frames
US20220375119A1 (en) * 2021-05-07 2022-11-24 Ncsoft Corporation Electronic device, method, and computer readable storage medium for obtaining video sequence including visual object with postures of body independently from movement of camera
US20230300281A1 (en) * 2022-03-18 2023-09-21 Ncsoft Corporation Electronic device, method, and computer readable recording medium for synchronizing videos based on movement of body
US20240119353A1 (en) * 2022-10-07 2024-04-11 Toyota Jidosha Kabushiki Kaisha Training data generation method and training data generation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200272148A1 (en) * 2019-02-21 2020-08-27 Zoox, Inc. Motion prediction based on appearance
US11036989B1 (en) * 2019-12-11 2021-06-15 Snap Inc. Skeletal tracking using previous frames
US20220375119A1 (en) * 2021-05-07 2022-11-24 Ncsoft Corporation Electronic device, method, and computer readable storage medium for obtaining video sequence including visual object with postures of body independently from movement of camera
US20230300281A1 (en) * 2022-03-18 2023-09-21 Ncsoft Corporation Electronic device, method, and computer readable recording medium for synchronizing videos based on movement of body
US20240119353A1 (en) * 2022-10-07 2024-04-11 Toyota Jidosha Kabushiki Kaisha Training data generation method and training data generation system

Similar Documents

Publication Publication Date Title
Cronin Using deep neural networks for kinematic analysis: Challenges and opportunities
CN107909061B (en) A head attitude tracking device and method based on incomplete features
JP7011608B2 (en) Posture estimation in 3D space
US10671156B2 (en) Electronic apparatus operated by head movement and operation method thereof
KR102377561B1 (en) Apparatus and method for providing taekwondo movement coaching service using mirror dispaly
Silvatti et al. Quantitative underwater 3D motion analysis using submerged video cameras: accuracy analysis and trajectory reconstruction
CN109176512A (en) A kind of method, robot and the control device of motion sensing control robot
US10750157B1 (en) Methods and systems for creating real-time three-dimensional (3D) objects from two-dimensional (2D) images
CN112233221A (en) 3D map reconstruction system and method based on real-time positioning and map construction
Papic et al. Improving data acquisition speed and accuracy in sport using neural networks
CN113419623A (en) Non-calibration eye movement interaction method and device
CN106485207A (en) A kind of Fingertip Detection based on binocular vision image and system
CN120339004B (en) Physical education evaluation system based on artificial intelligence
CN113111743B (en) Personnel distance detection method and device
Galindo et al. Landmark based eye ratio estimation for driver fatigue detection
CN105225270A (en) A kind of information processing method and electronic equipment
CN108027647B (en) Method and apparatus for interacting with virtual objects
CN115862124A (en) Sight estimation method and device, readable storage medium and electronic equipment
WO2025231077A1 (en) Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices
CN120431178A (en) A high-precision 3D posture acquisition and reconstruction system for sports training
WO2025231080A1 (en) Systems, devices, and computerized methods for tracking and displaying moving objects on mobile devices
CN110992391B (en) Method and computer readable medium for 3D detection and tracking pipeline recommendation
Cui et al. Trajectory simulation of badminton robot based on fractal brown motion
CN116659518B (en) Autonomous navigation method, device, terminal and medium for intelligent wheelchair
CN119440244B (en) Sight tracking method and system based on 3D gesture and 6DoF positioning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25798609

Country of ref document: EP

Kind code of ref document: A1