[go: up one dir, main page]

WO2024256017A1 - Enabling a contactless user interface for a computerized device equipped with a standard camera - Google Patents

Enabling a contactless user interface for a computerized device equipped with a standard camera Download PDF

Info

Publication number
WO2024256017A1
WO2024256017A1 PCT/EP2023/066218 EP2023066218W WO2024256017A1 WO 2024256017 A1 WO2024256017 A1 WO 2024256017A1 EP 2023066218 W EP2023066218 W EP 2023066218W WO 2024256017 A1 WO2024256017 A1 WO 2024256017A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
coordinates
graphical element
reference frame
keypoints
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2023/066218
Other languages
French (fr)
Inventor
Florian HAUFE
Michele XILOYANNIS
Tiberiu Ioan MUSAT
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Akina Ag
Original Assignee
Akina Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Akina Ag filed Critical Akina Ag
Priority to PCT/EP2023/066218 priority Critical patent/WO2024256017A1/en
Publication of WO2024256017A1 publication Critical patent/WO2024256017A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/033Recognition of patterns in medical or anatomical images of skeletal patterns

Definitions

  • the invention relates in general to the field of computer-implemented methods, computerized systems, and computer program products for enabling a contactless user interface, e.g., using a standard RGB camera.
  • it is directed to methods relying on changes in the relative size (i.e., an apparent dimension) of a given anthropometric feature of the user to compensate for changes in the user’s distance to the camera, as well as other user movements including rotations and translations parallel to the camera plane.
  • BACKGROUND Gesture recognition aims at interpreting human gestures (typically originating from the face or hand) through techniques of computer vision and image processing.
  • kinetic user interfaces KUIs
  • Such interfaces typically involve sensorised gloves, stereo cameras, and gesture-based controllers. A drawback of such approaches is that they require specific equipment.
  • the present invention is embodied as a computer-implemented method of enabling a contactless user interface.
  • the method comprises executing a graphical user interface to display a graphical element on a display device and instructing a camera to repeatedly acquire images of a user.
  • the method further comprises instructing to perform the following steps, for each image of at least some of the images acquired.
  • a pose tracking algorithm is executed to update coordinates of keypoints of the user in a base reference frame corresponding to said each image. I.e., each image considered can be associated with a corresponding base frame.
  • the keypoints notably include a target keypoint, which will be used to guide the graphical element.
  • the method computes anthropometric data, which capture a relative size of an anthropometric feature of the user in said each image.
  • the method further applies a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame of the user.
  • the method rescales the transformed coordinates according to said anthropometric data.
  • the method updates a position of the graphical element in a reference frame of the display according to the rescaled coordinates of the target keypoint.
  • it sends a signal encoding the updated position to the display device to accordingly move the graphical element displayed.
  • a touchless user interface control is enabled, whereby the proposed method does not require any physical contact between the user and any physical device.
  • the method can typically rely on images acquired by a conventional camera, e.g., a standard RGB camera.
  • the computed anthropometric data includes a relative size of a geometric element, which is bounded by at least two selected ones of the keypoints.
  • the transformed coordinates are simply rescaled according to a scaling function taking as argument a ratio of a reference size to said relative size as computed in the base reference frame for said each image.
  • said relative size P187721PC00 3 is a relative length of a line segment bounded by two selected ones of the keypoints, and said ratio is a ratio of a reference length to said length as computed for said each image.
  • the reference size corresponds to a value of said relative size as computed for an initial image of the repeatedly acquired images.
  • the reference size is determined in accordance with one or each of a width and a height of said each image.
  • the method further comprises, for said each image, applying a correction to compensate for an optical distortion (by the camera) of the updated coordinates of the keypoints in the base reference frame.
  • the camera is configured to repeatedly acquire said images at a given frame rate R1.
  • the pose tracking algorithm is instructed to execute for each image of only a subset of the repeatedly acquired images, at an average rate R2 that is strictly less than R1.
  • R 1 /2 ⁇ R 2 ⁇ 2 R 1 /3 is used to infer the keypoints.
  • the method further comprises, for said each image, determining the coordinate transformation between the base reference frame and the reference frame of the user based on the updated coordinates of selected ones of the keypoints of the user in the base reference frame.
  • Said the selected ones of the keypoints include three non-collinear keypoints, which define the reference frame of the user.
  • the applied coordinate transformation combines a rotation and a translation, the rotation is defined based on Euler angles between said base reference frame and the reference frame of the user, and the translation is defined based on coordinates of a reference keypoint in the base reference frame.
  • the reference keypoint corresponds to an origin of the reference frame of the user. It is selected from the three non-collinear keypoints and defines the origin of the reference frame of the user.
  • the coordinate transformation may advantageously be defined and applied as a single matrix multiplication.
  • the single matrix multiplication is optionally offloaded to a graphics processor unit, in the interest of computational speed.
  • said graphical element is a first graphical element.
  • the execution of the graphical user interface may cause to display additional graphical elements on the display device, in particular a second graphical element.
  • the method further comprises running a monitoring algorithm to detect a potential action to be performed on the second graphical element based on a relative position, in the reference frame of the display, of the first graphical element and the second graphical element.
  • the monitoring algorithm includes one or each of: a computer vision algorithm to detect a particular gesture of the user triggering said action; and a timer triggering said action based on a time duration during which a position of the first graphical element coincides with a position of the second graphical element.
  • said potential action is a selection of the second graphical element, and the method further comprises, upon detecting said potential action, instructing, for each image of at least some of the images subsequently acquired by the camera, to execute the pose tracking algorithm and rescale the subsequently transformed coordinates with a view to updating a position of the second graphical element, as previously done in respect of the first graphical element.
  • the position of the first graphical element is updated according to an attractor field of the second graphical element, in addition to said rescaled coordinates of the target keypoint.
  • said attractor field is devised so as to be minimal at a centre of the second graphical element and beyond an attraction field boundary surrounding the second graphical element, and maximal at an intermediate distance between the centre of the second graphical element and the attraction field boundary.
  • the invention is embodied as a computerized system for enabling a contactless user interface, wherein the computerized system comprises a camera, a display device, and processing means.
  • the processing means are configured to execute a graphical user interface to display a graphical element on the display device, instruct the camera to repeatedly acquire images of a user, and perform, for each image of at least some of the images acquired, each of the following steps.
  • a pose tracking algorithm is executed to update coordinates of keypoints of the user in a base reference frame corresponding to said each image, the keypoints including a target keypoint.
  • anthropometric data are computed from the updated coordinates.
  • the anthropometric data capture a relative size of an anthropometric feature of the user in said each image.
  • a coordinate transformation is applied to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame of the user.
  • the transformed coordinates are subsequently rescaled according to said P187721PC00 5 anthropometric data.
  • a position of the graphical element in a reference frame of the display is then updated according to the rescaled coordinates of the target keypoint, and a signal encoding the updated position is sent to the display device to accordingly move the graphical element displayed.
  • a final aspect concerns a computer program product for enabling a contactless user interface.
  • the computer program product comprises a computer readable storage medium having program instructions embodied therewith.
  • the program instructions are executable by processing means of a computerized system, which further comprises a camera and display device, to cause the computerized system to perform steps as described above in respect of the present methods.
  • FIG.1 is a diagram illustrating interactions between a user, a camera, processing means, and a display device, to enable a contactless user interface, as in embodiments
  • FIG.2 shows a user in the physical world.
  • Several virtual features are mapped onto the user by an algorithm, where such features include a reference plane, keypoints, and an anthropometric feature (a reference distance), as in embodiments;
  • FIG. 1 is a diagram illustrating interactions between a user, a camera, processing means, and a display device, to enable a contactless user interface, as in embodiments
  • FIG.2 shows a user in the physical world.
  • Several virtual features are mapped onto the user by an algorithm, where such features include a reference plane, keypoints, and an anthropometric feature (a reference distance), as in embodiments;
  • FIG. 1 is a diagram illustrating interactions between a user, a camera, processing means, and a display device, to enable a contactless user interface, as in embodiments
  • FIG.2 shows a user
  • FIG. 3 schematically illustrates the mapped features in a base reference frame corresponding to an image acquired by a camera, as involved in embodiments. I.e., the features correspond to a user as seen by the camera;
  • FIG.4 shows an application user interface run on a display device, where the application user interface displays two objects that can be controlled by a user in a contactless manner, as in embodiments;
  • FIG. 5 schematically illustrates an attractor field set around a displayed graphical element to ease user interactions, as involved in embodiments;
  • FIG.6 is a plot of a cosine function, which is used to model the intensity of the attractor along a direction of a vector extending from the centre of a displayed graphical element to the position of a cursor, in embodiments;
  • FIG.4 shows an application user interface run on a display device, where the application user interface displays two objects that can be controlled by a user in a contactless manner, as in embodiments;
  • FIG. 5 schematically illustrates an attract
  • FIG. 7 is a flowchart illustrating high-level steps of a method of enabling a contactless user interface, according to embodiments.
  • FIG.8 is a flowchart illustrating high-level steps of a monitoring algorithm, allowing a user to trigger an action in respect of a displayed graphical element, as in embodiments;
  • FIG. 9 schematically represents a general-purpose computerized system, suited for implementing one or more method steps as involved in embodiments of the invention;
  • FIG. 10 is a block diagram schematically illustrating selected components of, and modules implemented by, a computerized system for enabling a contactless user interface, according to in embodiments.
  • the accompanying drawings show simplified representations of devices or parts thereof, as involved in embodiments.
  • This aspect concerns a computer-implemented method of enabling a contactless (or touchless) user interface.
  • the method can for instance be performed by a computerized system 1, e.g., a computerized unit 200 (e.g., a smartphone, a tablet, a laptop) such as shown in FIG. 9.
  • the system 1 comprises a processing unit 3, a camera 2, and a display device 4, as illustrated in FIG.1.
  • the computerized unit 200 integrates all the components (processing means, display device, and camera).
  • the steps of the method are executed, or triggered, by processing means, which are operatively connected to the camera 2 and the display device 4.
  • processing means which are operatively connected to the camera 2 and the display device 4.
  • some of these steps may possibly be offloaded to a paired processor or a connected device.
  • E.g., such steps may be remotely performed at a server connected to a unit 200.
  • the system 1 actually concerns another aspect, which is described later in detail.
  • the method involves the execution S10 of a graphical user interface (GUI) 7.
  • GUI 7 may be executed as part of, or jointly with, an application or an operating system (OS) executing on the computerized system 1.
  • the application may be any application, e.g., a game, a word processing application.
  • this application is a digital health application, such as a digital coaching application.
  • executing the GUI causes to display one or more graphical elements 71, 72 on the display device 4, as illustrated in FIG.4.
  • graphical elements 71, 72 may include any graphical control element, such as a cursor, a mouse pointer, a widget, or any graphical image (e.g., an icon representing any object), with which the user interacts or through which the user interacts with other graphical elements displayed by the GUI, as in embodiments discussed later.
  • the elements 71, 72 include a cursor (or mouse pointer) 71 and a virtual object 72.
  • a typical application scenario of the present methods in one in which the user wants to select the object 72 through the cursor 71, through touchless interactions.
  • the method comprises instructing a camera 2 to repeatedly acquire S20 images of a user 5. Images acquired by the camera 2 are then exploited to enable contactless user actions with the GUI, as illustrated in the diagram of FIG. 1.
  • the method instructs to perform a series of steps, for each image of at least some of the images acquired. That is, not all images may be exploited; some images may be dropped (S25, S35), for reasons discussed later.
  • the following describes the series of steps performed to enable the contactless user actions.
  • P187721PC00 8 As seen in the flow of FIG.7, such steps include executing S30 a pose tracking algorithm.
  • Such an algorithm is known per se: it uses machine learning techniques to infer keypoints on the user’s body.
  • the aim is to identify, and then update, coordinates of keypoints 51 – 54 of the user 5.
  • the keypoints correspond to, i.e., are associated with, defined locations on the user body.
  • the identified keypoints notably include a target keypoint 53, which is used to control a GUI element 71, as explained below.
  • Such keypoints are defined in a base reference frame FB corresponding to each image.
  • the base reference frame FB of a given image is schematically depicted in FIG.3.
  • the reference frame FB is distinct from the reference frame F U of the user, as also seen in FIG.3.
  • a base reference frame is a frame corresponding to each image; it defines the keypoint space.
  • a further reference frame FS is shown in FIG.4, which corresponds to the frame subtended by the display screen of the display device 4.
  • the method computes S50 anthropometric data from the updated coordinates of the keypoints.
  • such anthropometric data capture a relative size of an anthropometric feature 6 of the user 5 in each image considered. That is, this relative size is an apparent dimension of the anthropometric feature 6, as seen by the camera.
  • the apparent size of the anthropometric feature is defined in the base reference frame FB.
  • the size of the anthropometric feature is computed based on coordinates of at least two of the keypoints, as defined in the base reference frame F B of each image.
  • This anthropometric feature is an anthropometric characteristic, which can notably be an ear-to-ear distance, an eye-to-eye distance, a shoulder-to-shoulder distance, a torso area, a leg length, etc.
  • this anthropometric feature corresponds to a physical characteristic of the user.
  • the anthropometric feature 6 corresponds to the ear-to-ear distance 45, which is calculated from the coordinates of selected keypoints 51, as defined in the base reference frame F B .
  • the method further applies S40, S60 a coordinate transformation to the updated coordinates of the keypoints.
  • the aim is to obtain transformed coordinates of the keypoints in the reference frame F U of the user 5.
  • the method rescales S70 the transformed coordinates of the keypoints (i.e., the coordinates as now defined in the reference frame FU) according to the relative size of said anthropometric feature, i.e., in accordance with said anthropometric data. All keypoints of interest can be rescaled, including the target keypoint 53.
  • This step is pivotal, P187721PC00 9 as it allows to compensate for changes in the user’s distance to the camera, as further discussed later.
  • the method can now update S80 a position of the graphical element 71 in the reference frame FS of the display according to the rescaled coordinates of the target keypoint 53.
  • the method sends S90 a signal encoding the updated position to the display device 4 to accordingly move the graphical element 71 as displayed in the display screen of the display device 4.
  • the same operations are repeated for each image of said at least some of the images acquired by the camera 2. This way, a touchless user interface control is enabled, which does not require any physical contact between the user 5 and any physical device.
  • the proposed method can be regarded as a real-time gesture tracking method, though not limited to hand, or head movements. Unlike other markerless systems relying on depth cameras or inertial measurement units (IMUs), here the method can typically rely on a conventional camera 2, such as a basic RGB camera. That is, the proposed method does not require any specific equipment.
  • IMUs inertial measurement units
  • a digital health application is a scenario in which a person engages in home-based physical therapy.
  • a software program is run on a computer equipped with an RGB webcam, so as to implement a method as described above, with a view to providing guidance and motivation during the performance of physical therapy exercises.
  • the person typically needs to stand at a distance from the computer, making it inconvenient to use traditional input devices like a keyboard or mouse.
  • the proposed method enables the user to interact in a contactless manner with the software and the computer, eliminating the need to physically touch any device to interact with the software.
  • this person may come to pause and resume a session, pause and resume playback of a video, adjust the sound volume, or exit the application.
  • P187721PC00 10 Pose tracking algorithms mostly assume images obtained from RGB cameras. Thus, various types of pose tracking algorithms (which are known per se) may be contemplated for use in the present context. A pose tracking algorithm makes it possible to extract keypoints 51 – 54 using machine learning methods, which are also know per se.
  • Suitable pose tracking algorithms include the so-called BlazePose GHUM (https://arxiv.org/abs/2210.06551), D3DP , MixSTE (https://arxiv.org/abs/2203.00859), U- CondDGConv@GT_2D_Pose (https://arxiv.org/pdf/2107.07797v2.pdf), and UGCN (https://arxiv.org/pdf/2004.13985v1.pdf) algorithms.
  • BlazePose GHUM https://arxiv.org/abs/2210.06551
  • D3DP D3DP
  • MixSTE https://arxiv.org/abs/2203.00859
  • U- CondDGConv@GT_2D_Pose https://arxiv.org/pdf/2107.07797v2.pdf
  • UGCN https://arxiv.org/pdf/2004.13985v1.pdf
  • the pose tracking algorithm may restrict to the extraction of a few, predetermined keypoints 51, 52, 52r, 53.
  • the keypoints 51 – 54 are spatial locations points of particular interest in each image.
  • the target keypoint 53 is a selected point on the body; this keypoint 53 is used to guide the graphical element 71 (e.g., a cursor, as assumed in the following) on the display screen. That is, movement of this point 53 in the physical (i.e., real) world results in movement of the cursor 71 on the display screen. Outputs from the pose tracking algorithm are exploited in such a manner that the target keypoint 53 remains the same across successive images; it always corresponds to the same location on the user body.
  • a reference frame is associated with a given thing (e.g., an image, the user body, etc.), to which a coordinate system is attached to.
  • a coordinate system is attached to.
  • a reference frame is defined by a reference point at the origin and a reference point at a unit distance along each of the n coordinate axes.
  • each reference frame is preferably defined in accordance with a Cartesian coordinate system, as usual in the field.
  • the coordinate transformation performed at step S60 is a reference frame transformation.
  • the computerized system 1 has no information about the absolute distance of the user from the camera 2. Because the scheme used to compute the coordinates in the base reference frame F B is agnostic to the true, absolute distance of the user from the camera, the present method relies on the change in the relative size (i.e., the apparent dimension) of the anthropometric feature 6 of the user in the base reference frame to compensate for changes in the user’s distance to the camera 2.
  • the transformed coordinates are preferably rescaled by applying a scaling function.
  • Various scaling functions may be contemplated. In general, such functions can be defined as an algorithmic procedure, which involves a numerical or analytical calculation. Preferably, this function is defined analytically, to speed up calculations.
  • the scaling function used may be a rational polynomial function or a simple polynomial function f.
  • This function may notably involve one or more constants c1, c2, ..., d0, P187721PC00 12 s0, where d0 (respectively s0) corresponds to an initial relative length (respectively an initial relative area) of a constant (i.e., rigid) anthropometric feature 6, determined in accordance with at least two (respectively at least three) selected points of the keypoints 51 – 54. Two points define a line, while three coplanar points define a boundary of an area.
  • the variables d and s are updated for each image.
  • the anthropometric data capture a relative size of a geometric element, which is bounded by at least two keypoints 51, selected from the identified keypoints 51 – 54, see FIGS.2 and 3.
  • the transformed coordinates of the keypoints are rescaled S70 according to a scaling function, which is preferably defined analytically.
  • the scaling function may take as argument a ratio of a reference size to the relative size of the anthropometric feature 6, as computed in the base reference frame F B for each image. For example, one may rely on a simple scaling factor of the form d0/d.
  • the target keypoint 53 is preferably distinct from the two keypoints bounding the line segment of length d.
  • the two keypoints 51 used to compute d correspond to ears in the example of FIGS.2 and 3.
  • the relative size of the anthropometric feature is a relative length 45 of the line segment bounded by the two selected keypoints 51, such that the ratio d0/d is a ratio of a reference length d 0 to the length d as computed for each image considered. More generally, though, these two keypoints may also correspond to eyes, iliac crests, greater trochanters, etc., and accordingly bound a rigid line segment of the user body.
  • the length d is essentially constant in the user reference frame, subject to small (i.e., negligible) variations. In other words, said length is a distance between two keypoints on the user body that are essentially rigidly connected.
  • the reference size d0 corresponds to a value of the size d as computed for an initial image of the images repeatedly acquired by the camera 2.
  • d is typically updated at a rate of 15 to 20 fps, as further discussed below.
  • the length d 0 may actually be any predefined length.
  • a more efficient alternative is to undistort coordinates of the sole keypoints of interest.
  • the scaling function is devised to compensate for the optical distortion of the camera 2, at least partly, whereby the correction is performed S70 upon rescaling. That is, P187721PC00 14 the scaling function can be purposely altered to compensate for the optical distortion of the hardware camera.
  • the function can again be devised as a rational polynomial function.
  • a polynomial form is preferred, given that radial distortion can be modelled using polynomial equations.
  • the parameters of the function can be adjusted to partly compensate for optical distortion.
  • the camera 2 is configured to repeatedly acquire S20 images at a given frame rate R1, which is usually fixed. Still, the pose tracking algorithm can be instructed to execute S30 for each image of only a subset of the repeatedly acquired images.
  • the pose tracking algorithm is executed S30 at an average rate R2 that is strictly less than R1 (R1 ⁇ R2).
  • R2 is chosen so that R1/2 ⁇ R2 ⁇ 2 R1/3. That is, the refresh rate of the relative size of the anthropometric feature (e.g., the length d) is lowered to allow more time for the pose tracking algorithm to perform inferences, i.e., to infer the semantic keypoints.
  • the actual frame rate of the camera will typically be equal to 30 fps, whereas the preferred rate for refreshing d is preferably set between 15 and 20 fps.
  • the Nyquist sampling theorem advocates using a sampling frequency that is at least twice the highest frequency of interest.
  • R0 is a maximal user interaction frequency of interest for the problem at issue
  • R0 can be regarded as the Nyquist or folding frequency for the problem at hand.
  • a coordinate transformation is applied (at step S60, FIG. 7) to the updated coordinates of the keypoints to obtain transformed coordinates of the keypoints in the user reference frame F U .
  • This coordinate transformation is a transformation between the base reference frame FB and the user reference frame FU.
  • this transformation can be simply determined based on the updated coordinates (in the base reference frame FB) of keypoints 52, 52r selected from the keypoints 51 – 54.
  • the selected keypoints 52, 52r include P187721PC00 15 three non-collinear keypoints, which define a reference plane of the user, as shown in FIGS.2 and 3.
  • the method further comprises determining S40, for each image, the coordinate transformation between the frame FB and frame FU based on the updated coordinates of the selected keypoints 52, 52r, as computed in the frame FB.
  • the coordinate transformation is then applied S60 for the reference frame F B .
  • unit vectors can be defined for the user reference frame FU, from the selected keypoints 52, 52r. One of these keypoints defines the origin of the user reference frame FU.
  • a similar transformation is determined and then applied S60 for each new reference frame F B (i.e., for each new image considered). So, in practice, for each new image considered, the transformation (e.g., a translation and a rotation) is determined S40 and applied S60. Then, one rescales S70 the keypoint coordinates according to the anthropometric feature size, and subsequently map S80 the target keypoint to a position on the display screen.
  • the frame FU (attached to the user's body) can be determined from the 3D coordinates of selected keypoints, as defined in the frame F B .
  • the selected keypoints 52, 52r define axes x', y', z' of the frame F U , as illustrated in FIG. 3.
  • Corresponding unit vectors are defined from the axes, given an origin 52r.
  • x', y', z' denote both the axes of the frame FU and the corresponding unit vectors in FIGS.2 and 3.
  • x, y, z denote both the axes of the frame F B and the corresponding unit vectors in FIG.3.
  • points 52, 52r can easily be determined thanks to outputs from the pose tracking algorithm. For example, see FIG.
  • y' can be taken as the axis that is parallel to the shoulder-to-shoulder axis, i.e., the axis extending through the keypoints corresponding to the right shoulder and the left shoulder.
  • the axis x' can be taken as a vector parallel to the shoulder-to-hip axis.
  • the remaining axis/unit vector z' is perpendicular to the plane spanned by x' and y'. So, a suitable keypoint selection makes it possible to determine the coordinates of the three unit vectors x', y', z' in the frame F B .
  • the coordinate transformation determined at step S40 preferably combines a rotation and a translation.
  • the rotation can be defined based on Euler angles between the base reference frame FB and the reference frame of the user 5.
  • the translation can be defined based on coordinates of a reference keypoint in the base reference frame FB.
  • a reference plane (x', y') is defined by three non-collinear keypoints 52, 52r, which P187721PC00 16 are selected from the keypoints 51 – 54, see FIG. 2.
  • the reference keypoint 52r is selected among the keypoints 52 and corresponds to the origin of the user reference frame.
  • the reference keypoint 52r and the reference plane (x', y') are used to define the user reference frame FU, which translates and rotates together with the user 5. So, the user reference frame FU has its origin at the reference keypoint 52r, and its XY plane coincides with the reference plane.
  • the movement of the cursor 71 is isolated (i.e., made independent) from any rotation of the user around the human longitudinal axis (i.e., cephalocaudal axis) when standing upright in the real-world space. That is, a movement of the target keypoint 53 results in a consistent movement of the cursor 71, independently of any rotation of the user 5 in front of the camera 2.
  • the movement of the cursor is isolated from any translation of the user parallel to the camera plane. Rigid translations of the whole body in the XY plane do not result in undesired movements of the cursor 71. This improves user interactions and the user experience.
  • R [e 1' e 2' e 3' ] ⁇ [e 1 e 2 e 3 ] T
  • [e1' e2' e3'] is a matrix whose columns are the basis vectors of the x'y'z' coordinate system
  • [e1 e2 e3] is a matrix whose columns are the basis vectors of the xyz coordinate system
  • T denotes the transpose operation.
  • the vectors e1’, e2’, and e3’ are orthonormal.
  • the coordinate transformation is defined and applied S60 as a single matrix multiplication, as discussed in detail in Section 2.
  • This repetitive operation can advantageously be offloaded to a GPU, if any.
  • the GUI 7 may display one or more additional graphical elements 72 on the display device 4, beside the first graphical element 71.
  • various algorithmic recipes can be applied to allow the user to perform an action on the second graphical element 72, thanks to control exerted through the first element 71.
  • the method may further comprise running S130 – S140 a monitoring algorithm to detect (S130: Yes; S140: Yes) a potential action to be performed S150 on the second graphical element 72 based on a relative position of the first graphical element 71 and the second graphical element 72 in the reference frame FS of the display.
  • Said action may for instance be a mere selection of the element 72, trigger an action (e.g., start execution), or display a drop-down menu relating to the element 72.
  • the monitoring algorithm is executed concurrently with the main flow, see step S110 in FIG. 8. It may notably involve a computer vision algorithm and/or a timer triggering the desired action.
  • the method instructs, for each image subsequently obtained (S20, S25) from the camera 2, to execute S30 the pose tracking algorithm and rescale S70 the subsequently transformed coordinates with a view to updating S80, S90 a position of the second graphical element 72, as so far done in respect of the first graphical element 71.
  • attractors can advantageously be used to ease manipulations by the user.
  • the position of the first graphical element 71 can be updated S80 according to an attractor field assigned to the second graphical element 72.
  • the update step S80 additionally makes use of the rescaled coordinates of the target keypoint 53, as explained earlier.
  • Attractors can be contemplated, which are known per se. However, it is preferred to rely on attractor field that is devised so as to be minimal at the centre of the second graphical element 72 and beyond an attraction field boundary surrounding the second graphical element 72 and be maximal at an intermediate distance between the centre of the second graphical element 72 and the attraction field boundary, as illustrated in FIG.5.
  • attractor field that is devised so as to be minimal at the centre of the second graphical element 72 and beyond an attraction field boundary surrounding the second graphical element 72 and be maximal at an intermediate distance between the centre of the second graphical element 72 and the attraction field boundary, as illustrated in FIG.5.
  • Use can for instance be made of a cosine function, which is maximal at the half distance to the centre of the second element 72, see FIG. 6.
  • the function actually depicted is ⁇ ⁇ (1 ⁇ Cos[2 ⁇ ]), where u corresponds to the normalized radial distance (x-axis), while the value of the function reflects the attraction magnitude (y-axis).
  • FIG.7 shows a preferred flow.
  • the GUI starts running at step S10.
  • a loop is started at step S20, whereby the camera repeatedly acquires images. Only a subset of the images produced are fed S30 to the pose tracking algorithm, for it to identify (first time) and then update keypoint coordinates. I.e., some of the images are dropped (S25: No, S35).
  • the coordinate transformation is determined at step S40 and then applied (step S60) to the updated coordinates of the keypoints, so as to obtain transformed coordinates of the keypoints in the user reference frame.
  • anthropometric data are computed at step S50; the transformed coordinates of the keypoints are rescaled at step S70, according to anthropometric data.
  • the position of a graphical element in the display reference frame is accordingly updated at step S80 and a corresponding signal is sent S90 to the display device to move the graphical element displayed.
  • a monitoring algorithm can be run in parallel to the flow of FIG. 7 (this corresponding to Flow #1, step S110 in FIG. 8).
  • the relative position of the first and second graphical elements is monitored at step S120, which is continually performed.
  • a computer vision algorithm is run, and/or a timer is triggered.
  • the method checks S140 whether a certain criterion is met. I.e., is the computer vision algorithm able to identify a predetermined gesture? Has a predefined time period elapsed? If so (S140: Yes), an action (e.g., selection, start execution, display drop-down menu, etc.) is triggered at step S150 in respect of the second graphical element.
  • the algorithm keeps on monitoring S120 the relative position of the graphical elements, irrespective of the outcomes of steps S130 and S140.
  • the invention can be embodied as a computerized system 1 for enabling a contactless user interface.
  • the computerized system 1 comprises a camera 2, a display device 4, and processing means 230.
  • the system 1 may be a desktop computer 3, a smartphone, a tablet, a laptop, etc.
  • the processing means 230 are configured to execute S10 a GUI 7, instruct the camera 2 to repeatedly acquire S20 images of a user 5, and perform steps (see, e.g., steps S25 – S90 in FIG. 7) as described earlier in reference to the present methods.
  • a first module 231 is run to execute the pose tracking algorithm, which causes the coordinate update module 232 to update the coordinates of the keypoints in the base reference frame FB.
  • the module 233 is run to compute anthropometric data from the updated coordinates.
  • the P187721PC00 20 module 234 computes and applies the coordinate transformation to obtain transformed coordinates of the keypoints in the user reference frame F U .
  • the transformed coordinates are then rescaled (according to the anthropometric data) by the module 235, whereby a tracking module 236 can update positions of the graphical elements to be displayed in the display reference frame FS in accordance with the rescaled coordinates of the target keypoint. Additional modules 237 – 238 may be involved.
  • an optical distortion module 237 may be run to correct keypoint coordinates.
  • a monitoring module 238 may be executed, whereby a computer vision module 238 may be run to identify specific user gestures.
  • this module 238 may involve a timer to trigger an action in respect of a graphical element displayed, as explained earlier.
  • the signals consumed and produced by the various modules 231 – 238 and components 2, 4 transit trough input/output (I/O) management units 260 – 270, see also FIG. 9.
  • the updated positions of the graphical elements can be sent to the display device 4, for it to accordingly display movements of elements manipulated by the user, in operation. Additional features of the present computerized systems 1 and computer program products are described in section 3.
  • a computer vision algorithm processes the camera’s images in real-time, identifying and tracking a set of keypoints 51 – 54 on the body, such as the hands, wrists, shoulders, hips, knees, ankles, and feet.
  • the position of a selected keypoint i.e., target keypoint 53
  • the proposed allows an intuitive use of the cursor 71 through the following innovations: • The movement of the cursor is independent of the rotation of the user in space; • The movement of the cursor is independent from translations of the user parallel to the camera plane; and • The movement of the cursor is independent from translations of the user in the direction perpendicular to the camera plane.
  • the position of the cursor overlaps with that of a button or any other interactive element on the display screen, the user can interact with it in two ways: (a) by performing a specific hand gesture (e.g., opening and closing the hand, which would correspond to a mouse click); or (b) by overlapping the cursor’s position with that of the button for a predetermined amount of time.
  • the two mechanisms may possibly be concurrently implemented.
  • the reference distance corresponds to the anthropometric feature, i.e., a distance between two keypoints on the body that are rigidly connected (e.g., the ears, the eyes).
  • 2D images from the camera 2 can be associated with a base reference frame F B .
  • the keypoints are generated by the pose tracking algorithm. They consist of XYZ coordinates for a set of points on the human body, identified from the 2D image of the RGB camera 2. The XY coordinates of each keypoint are set between 0 and 1, this depending on the position of the keypoint in the image. The top left corner is [0, 0], and the bottom right corner is [1, 1].
  • the z- coordinate encodes information in the direction perpendicular to the camera frame.
  • the display screen is associated with a further frame F S .
  • the user can control the position of the cursor 71 by moving the target keypoint 53.
  • the aim is to map the position of a target point 53 in the physical space to the position of the cursor 71 on the display screen and allow the user to interact with elements displayed on the display screen.
  • the movement of the cursor should ideally be isolated from a rotation of the user in the physical space. I.e., a movement of the target keypoint should ideally result in a consistent movement of the cursor, independently of any rotation of the user with respect to the camera; • Furthermore, the movement of the cursor should ideally be isolated from translations of the user parallel to the camera plane. Rigid translations of the whole body in the XY direction should not result in moving of the displayed cursor 71; • The movement of the cursor should be isolated from the translation of the user along the camera axis, i.e., in the direction perpendicular to the camera plane.
  • a movement of the target keypoint should result in a consistent movement of the cursor, independently of the distance of the user from the camera plane; and • Lack of fine motor control, latency in the computation of the keypoints, and/or jittering of the keypoints, may prevent the user from guiding the pointer 71 towards a desired interactive element in a smooth and intuitive way.
  • the following describes an adaptive mapping function to compensate for translations and rotations of the user.
  • the proposed solution is to transform the coordinates of the target keypoint 53 from the base reference frame FB, defined by axes ⁇ , ⁇ , ⁇ (unit vectors of the frame FB are defined along said axes), to the user reference frame FU attached to the user’s body, and defined by axes ⁇ ⁇ , ⁇ ⁇ , ⁇ ′.
  • unit vectors of the frames FB are FU are defined along the axes ⁇ , ⁇ , ⁇ and ⁇ , ⁇ , ⁇ ′, whereby the frames FB are FU can be respectively referred to as the ⁇ and ⁇ ⁇ ⁇ ⁇ ⁇ ′ frames.
  • the coordinate transformation sought can be defined as: P187721PC00 23 where ⁇ is a 3 ⁇ 3 rotation matrix and ⁇ is the 3 ⁇ 1 translation vector.
  • the matrix on the right-hand side represents the combined transformation.
  • the rotation matrix ⁇ can be defined using a combination of rotations around the x-axis, y- axis, and z-axis.
  • This transformation ensures that the cursor’s position is not affected by translations or rotations of the user in space, but only by relative movements between the target keypoint and the reference keypoint.
  • the following describes an adaptive mapping function to estimate the depth thanks to a rigid anthropometric feature, so as to isolate the cursor movements from any translation of the user along the camera axis.
  • the z-coordinates in the frame F B do not indicate the absolute distance of the user from the camera, we may advantageously rely on the change in size of a rigid object as a proxy for the user’s distance from the camera.
  • This object can for instance be chosen as the segment 45 (also called reference segment, see FIG.3).
  • the distance ⁇ is regularly updated, at a rate of 15 to 20 fps. As a result, the target position is not affected by the distance of the user from the camera.
  • Attractors may prevent the user from guiding the pointer towards a desired element 72 in a smooth and intuitive way.
  • Targets can be any elements on the display screen, such as buttons and widgets, which the user can interact with through the cursor 71.
  • the added attractors may act as gravity fields, pulling the cursor towards the centre of the interactive element when the cursor is within an attraction distance and thus caught by the attraction field. Care should be taken to carefully design such attractors to prevent the user from accidentally pressing the wrong button or getting stuck in the element 72.
  • This yields a vector ⁇ ⁇ defined by: ⁇ ⁇ + ⁇ .
  • Use can advantageously be made of a ⁇ cosine velocity attractor for the computation of ⁇ , namely: where ⁇ are the coordinates of the centre of the attractor (see the black dot at the centre of the shape 72 in FIG.4), ⁇ points to the pointer position (see the white dot in FIG.4), and ⁇ ⁇ points to the projection (see the patterned dot in FIG. 4) of ⁇ to the attraction field border 80. Note, the projection is actually defined as the extension of ⁇ to the boundary 80.
  • ⁇ ⁇ points to a point where the boundary 80 is intersected by the axis passing through ⁇ and originating from the centre of the element 72.
  • the constant C is adjusted to control the rate of attraction of the cursor 71 to the centre of graphical element 72.
  • the behaviour of the above attractor can be described as follows.
  • the cursor 71 is not attracted when located at the centre of the attractor (to prevent to get stuck at the shape 72) or at (or P187721PC00 25 beyond the boundary 80 of the attractor. However, the cursor 71 is attracted when located between the boundary 80 and the centre of the attractor. Note, this relation affects the velocity of the offset.
  • a cosine-like function determines the amplitude of the attraction field, see FIG.5.
  • ⁇ ⁇ ⁇ ⁇ ⁇ ), and equal to 1 at u 0.5.
  • Such values correspond to the centre of the attractor, the edge of the attractor, and the edge of the attraction field, i.e., the boundary 80, assuming the attraction field has a diameter that is twice the diameter of the attractor.
  • this function is clipped to 0 for u > 1 and undefined for u ⁇ 0.
  • the accumulation of offset over multiple interactions with attractive elements may sometimes lead to the pointer drifting off the display screen. To counteract this, we may advantageously apply an exponential decay to the offset.
  • the method steps S10 – S150 described earlier in reference to FIGS.7 and 8, are implemented in (or triggered by) software, e.g., one or more executable programs, executed by processing means 230 of the computerized system 1, which may include one or more computerized units such as depicted in FIG. 9.
  • the system 1 consists of a single unit 200, such as a smartphone, a tablet, a laptop, or a desktop computer, which integrates all required components, starting with the camera 2, the display device 4, and processing means 230.
  • Computerized devices and components can be suitably configured for implementing embodiments of the present invention as described herein.
  • the methods described herein are partly non-interactive, i.e., partly automated.
  • Automated P187721PC00 26 parts of such methods can be implemented in software, hardware, or a combination thereof.
  • automated parts of the methods described herein are implemented in software, as a service or an executable program (e.g., an application), the latter executed by suitable digital processing devices.
  • a typical computerized device (or unit) 200 may include a processor 230 and a memory 250 (possibly including several memory units) coupled to one or memory controllers 240.
  • the processor 230 is a hardware device for executing software loaded in a main memory of the device.
  • the processor can be any custom made or commercially available processor.
  • the processor may notably be a central processing unit (CPU), as assumed in FIG. 9.
  • the memory 250 of the unit 200 typically includes a combination of volatile memory elements (e.g., random access memory) and non-volatile memory elements, e.g., a solid-state device.
  • the software in memory may include one or more separate programs, each of which comprises executable instructions for implementing functions as described herein.
  • the software in the memory includes methods described herein in accordance with exemplary embodiments and a suitable OS.
  • the OS essentially controls the execution of other computer (application) programs and provides scheduling, input-output control, file and data management, memory management, and communication control and related services. It may further control the distribution of tasks to be performed by various processing units.
  • the computerized unit 200 can further include a display controller 282 coupled to a display device 4.
  • the computerized unit 200 further includes a network interface 290 or transceiver for coupling to a network (not shown).
  • the computerized unit 200 will typically include one or more input and/or output (I/O) devices 2, 210, 220 (or peripherals, including the camera 2) that are communicatively coupled via a local I/O controller 260.
  • a system bus 270 interfaces all components.
  • the local interface may include address, control, and/or data connections to enable appropriate communications among the aforementioned components.
  • the I/O controller 135 may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to allow data communication. P187721PC00 27
  • one or more processing units 230 execute software stored within the memory of the computerized unit 200, to communicate data to and from the memory 250 and/or the storage unit 255 (e.g., a hard drive and/or a solid-state memory), and to generally control operations pursuant to software instruction.
  • the methods described herein and the OS are read (in whole or in part) by the processing elements, typically buffered therein, and then executed.
  • Computer readable program instructions described herein can be downloaded to processing elements from a computer readable storage medium, via a network, for example, the Internet and/or a wireless network.
  • a network adapter card or network interface 290 may receive computer readable program instructions from the network and forwards such instructions for storage in a computer readable storage medium 255 interfaced with the processing means 230.
  • These computer readable program instructions may be provided to one or more processing elements 230 as described above, to produce a machine, such that the instructions, which execute via the one or more processing elements, create means for implementing the functions or acts specified in the blocks of the flowcharts of FIGS. 7, 8 and the block diagram of FIG. 10.
  • These computer readable program instructions may also be stored in a computer readable storage medium.
  • the flowchart and the block diagram in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of the computerized unit 200, methods of operating it, and computer program products, according to various embodiments of the present invention.
  • each computer-implemented block in the flowcharts or the block diagram may represent a (sub)module, or a set of instructions, which comprise(s) executable instructions for implementing the functions or acts specified therein.
  • the functions or acts mentioned in the blocks may occur out of the order specified in the figures. For example, two blocks shown in succession may actually be executed in parallel, concurrently, or still in a reverse order, depending on the functions involved and the algorithm P187721PC00 28 optimization retained. It is also reminded that each block and combinations thereof can be adequately distributed among special purpose hardware components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Geometry (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention is notably directed to a computer-implemented method of enabling a contactless user interface. The method comprises executing a graphical user interface to display a graphical element on a display device and instructing a camera to repeatedly acquire images of a user. The method further comprises instructing to perform the following steps, for each image of at least some of the images acquired. First, a pose tracking algorithm is executed to update coordinates of keypoints (51 – 54) of the user in a base reference frame corresponding to said each image. I.e., each image considered can be associated with a corresponding base frame (F B ). The keypoints notably include a target keypoint (53), which will be used to guide the graphical element. Second, from the updated coordinates, the method computes anthropometric data, which capture a relative size of an anthropometric feature (45) of the user in said each image. The method further applies a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (F U ) of the user. The reference frame is "attached" to the user body. Next, the method rescales the transformed coordinates according to said anthropometric data. The method then updates a position of the graphical element in a reference frame of the display according to the rescaled coordinates of the target keypoint. Finally, it sends a signal encoding the updated position to the display device to accordingly move the graphical element displayed. The proposed method works with conventional RGB cameras (no depth camera is needed) and is particularly well suited to games and, more specifically, to digital health applications. The invention further concerns related computerized devices and computer program products.

Description

P187721PC00 1 ENABLING A CONTACTLESS USER INTERFACE FOR A COMPUTERIZED DEVICE EQUIPPED WITH A STANDARD CAMERA TECHNICAL FIELD The invention relates in general to the field of computer-implemented methods, computerized systems, and computer program products for enabling a contactless user interface, e.g., using a standard RGB camera. In particular, it is directed to methods relying on changes in the relative size (i.e., an apparent dimension) of a given anthropometric feature of the user to compensate for changes in the user’s distance to the camera, as well as other user movements including rotations and translations parallel to the camera plane. BACKGROUND Gesture recognition aims at interpreting human gestures (typically originating from the face or hand) through techniques of computer vision and image processing. Concepts of contactless (also called <touchless=) user interfaces have been proposed, which are based on gesture control. The goal is to allow a user to control a computer via body motion and gestures without physically touching any device, e.g., a keyboard, a mouse, a display screen, or any peripheral device. For example, kinetic user interfaces (KUIs) have been proposed, which allow users to interact with computing devices through the motion of objects and bodies. Such interfaces typically involve sensorised gloves, stereo cameras, and gesture-based controllers. A drawback of such approaches is that they require specific equipment. As a result, they are not suited for a widespread adoption of digital technology such as digital healthcare. In that respect, the concept of <therapy at home= is slow to take hold because patients lack guidance and motivation. Adding hardware constraints will only impede further the uptake of home therapy. Therefore, there is a need to enable a contactless user interface with simple equipment (i.e., laptop, tablet, or smartphone), which does not require additional devices, and is simple to use for patients. P187721PC00 2 SUMMARY According to a first aspect, the present invention is embodied as a computer-implemented method of enabling a contactless user interface. The method comprises executing a graphical user interface to display a graphical element on a display device and instructing a camera to repeatedly acquire images of a user. The method further comprises instructing to perform the following steps, for each image of at least some of the images acquired. First, a pose tracking algorithm is executed to update coordinates of keypoints of the user in a base reference frame corresponding to said each image. I.e., each image considered can be associated with a corresponding base frame. The keypoints notably include a target keypoint, which will be used to guide the graphical element. Second, from the updated coordinates, the method computes anthropometric data, which capture a relative size of an anthropometric feature of the user in said each image. The method further applies a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame of the user. The reference frame of the user can be regarded as a frame that is <attached= to the user body. Next, the method rescales the transformed coordinates according to said anthropometric data. The method then updates a position of the graphical element in a reference frame of the display according to the rescaled coordinates of the target keypoint. Finally, it sends a signal encoding the updated position to the display device to accordingly move the graphical element displayed. This way, a touchless user interface control is enabled, whereby the proposed method does not require any physical contact between the user and any physical device. Unlike other markerless systems relying on depth cameras or inertial measurement units, here the method can typically rely on images acquired by a conventional camera, e.g., a standard RGB camera. Still, changes in the relative size (i.e., the apparent dimension) of the anthropometric feature of the user are used as a proxy to compensate for user translations along the camera axis. Additional user movements (e.g., rotations and translation parallel to the camera plane) can possibly be compensated, as in embodiments discussed below. In embodiments, the computed anthropometric data includes a relative size of a geometric element, which is bounded by at least two selected ones of the keypoints. The transformed coordinates are simply rescaled according to a scaling function taking as argument a ratio of a reference size to said relative size as computed in the base reference frame for said each image. For example, using a ratio of the reference size to the relative size is efficiently and quickly computed, and reflects the optical reality (as seen by the camera) well, inasmuch as the size of an object is inversely proportional to the distance to the camera. Preferably, said relative size P187721PC00 3 is a relative length of a line segment bounded by two selected ones of the keypoints, and said ratio is a ratio of a reference length to said length as computed for said each image. In embodiments, the reference size corresponds to a value of said relative size as computed for an initial image of the repeatedly acquired images. In more straightforward efficient variants, the reference size is determined in accordance with one or each of a width and a height of said each image. In embodiments, the method further comprises, for said each image, applying a correction to compensate for an optical distortion (by the camera) of the updated coordinates of the keypoints in the base reference frame. In preferred embodiments, the camera is configured to repeatedly acquire said images at a given frame rate R1. The pose tracking algorithm is instructed to execute for each image of only a subset of the repeatedly acquired images, at an average rate R2 that is strictly less than R1. Preferably, R1/2 ≤ R2 ≤ 2 R1/3. This way, more time is allowed for the pose tracking algorithm to infer the keypoints. In embodiments, the method further comprises, for said each image, determining the coordinate transformation between the base reference frame and the reference frame of the user based on the updated coordinates of selected ones of the keypoints of the user in the base reference frame. Said the selected ones of the keypoints include three non-collinear keypoints, which define the reference frame of the user. Preferably, the applied coordinate transformation combines a rotation and a translation, the rotation is defined based on Euler angles between said base reference frame and the reference frame of the user, and the translation is defined based on coordinates of a reference keypoint in the base reference frame. The reference keypoint corresponds to an origin of the reference frame of the user. It is selected from the three non-collinear keypoints and defines the origin of the reference frame of the user. The coordinate transformation may advantageously be defined and applied as a single matrix multiplication. Where possible, the single matrix multiplication is optionally offloaded to a graphics processor unit, in the interest of computational speed. In preferred embodiments, said graphical element is a first graphical element. Still, the execution of the graphical user interface may cause to display additional graphical elements on the display device, in particular a second graphical element. P187721PC00 4 Preferably, the method further comprises running a monitoring algorithm to detect a potential action to be performed on the second graphical element based on a relative position, in the reference frame of the display, of the first graphical element and the second graphical element. In embodiments, the monitoring algorithm includes one or each of: a computer vision algorithm to detect a particular gesture of the user triggering said action; and a timer triggering said action based on a time duration during which a position of the first graphical element coincides with a position of the second graphical element. Preferably, said potential action is a selection of the second graphical element, and the method further comprises, upon detecting said potential action, instructing, for each image of at least some of the images subsequently acquired by the camera, to execute the pose tracking algorithm and rescale the subsequently transformed coordinates with a view to updating a position of the second graphical element, as previously done in respect of the first graphical element. In preferred embodiments, the position of the first graphical element is updated according to an attractor field of the second graphical element, in addition to said rescaled coordinates of the target keypoint. Preferably, said attractor field is devised so as to be minimal at a centre of the second graphical element and beyond an attraction field boundary surrounding the second graphical element, and maximal at an intermediate distance between the centre of the second graphical element and the attraction field boundary. According to another aspect, the invention is embodied as a computerized system for enabling a contactless user interface, wherein the computerized system comprises a camera, a display device, and processing means. The processing means are configured to execute a graphical user interface to display a graphical element on the display device, instruct the camera to repeatedly acquire images of a user, and perform, for each image of at least some of the images acquired, each of the following steps. First, a pose tracking algorithm is executed to update coordinates of keypoints of the user in a base reference frame corresponding to said each image, the keypoints including a target keypoint. Second, anthropometric data are computed from the updated coordinates. The anthropometric data capture a relative size of an anthropometric feature of the user in said each image. Moreover, a coordinate transformation is applied to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame of the user. The transformed coordinates are subsequently rescaled according to said P187721PC00 5 anthropometric data. A position of the graphical element in a reference frame of the display is then updated according to the rescaled coordinates of the target keypoint, and a signal encoding the updated position is sent to the display device to accordingly move the graphical element displayed. A final aspect concerns a computer program product for enabling a contactless user interface. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by processing means of a computerized system, which further comprises a camera and display device, to cause the computerized system to perform steps as described above in respect of the present methods. BRIEF DESCRIPTION OF THE DRAWINGS These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings. The illustrations are for clarity in facilitating one skilled in the art in understanding the invention in conjunction with the detailed description. In the drawings: FIG.1 is a diagram illustrating interactions between a user, a camera, processing means, and a display device, to enable a contactless user interface, as in embodiments; FIG.2 shows a user in the physical world. Several virtual features are mapped onto the user by an algorithm, where such features include a reference plane, keypoints, and an anthropometric feature (a reference distance), as in embodiments; FIG. 3 schematically illustrates the mapped features in a base reference frame corresponding to an image acquired by a camera, as involved in embodiments. I.e., the features correspond to a user as seen by the camera; FIG.4 shows an application user interface run on a display device, where the application user interface displays two objects that can be controlled by a user in a contactless manner, as in embodiments; FIG. 5 schematically illustrates an attractor field set around a displayed graphical element to ease user interactions, as involved in embodiments; P187721PC00 6 FIG.6 is a plot of a cosine function, which is used to model the intensity of the attractor along a direction of a vector extending from the centre of a displayed graphical element to the position of a cursor, in embodiments; FIG. 7 is a flowchart illustrating high-level steps of a method of enabling a contactless user interface, according to embodiments; FIG.8 is a flowchart illustrating high-level steps of a monitoring algorithm, allowing a user to trigger an action in respect of a displayed graphical element, as in embodiments; FIG. 9 schematically represents a general-purpose computerized system, suited for implementing one or more method steps as involved in embodiments of the invention; FIG. 10 is a block diagram schematically illustrating selected components of, and modules implemented by, a computerized system for enabling a contactless user interface, according to in embodiments. The accompanying drawings show simplified representations of devices or parts thereof, as involved in embodiments. Similar or functionally similar elements in the figures have been allocated the same numeral references, unless otherwise indicated. Computerized systems, methods, and computer program products embodying the present invention will now be described, by way of non-limiting examples. DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION The following description is structured as follows. General embodiments and high-level variants are described in section 1. Section 2 addresses particularly preferred embodiments. Section 3 concerns technical implementation details. Note, the present method and its variants are collectively referred to as the <present methods=. All references Sn refer to methods steps of the flowcharts of FIGS.7 and 8, while numeral references pertain to devices, components, and concepts involved in embodiments of the present invention. P187721PC00 7 1. General embodiments and high-level variants A first aspect of the invention is now described in detail in reference to FIGS.1 – 4, and 7. This aspect concerns a computer-implemented method of enabling a contactless (or touchless) user interface. The method can for instance be performed by a computerized system 1, e.g., a computerized unit 200 (e.g., a smartphone, a tablet, a laptop) such as shown in FIG. 9. The system 1 comprises a processing unit 3, a camera 2, and a display device 4, as illustrated in FIG.1. Preferably, the computerized unit 200 integrates all the components (processing means, display device, and camera). In all cases, the steps of the method are executed, or triggered, by processing means, which are operatively connected to the camera 2 and the display device 4. However, some of these steps may possibly be offloaded to a paired processor or a connected device. E.g., such steps may be remotely performed at a server connected to a unit 200. The system 1 actually concerns another aspect, which is described later in detail. The method involves the execution S10 of a graphical user interface (GUI) 7. The GUI 7 may be executed as part of, or jointly with, an application or an operating system (OS) executing on the computerized system 1. The application may be any application, e.g., a game, a word processing application. Preferably, however, this application is a digital health application, such as a digital coaching application. As usual, executing the GUI causes to display one or more graphical elements 71, 72 on the display device 4, as illustrated in FIG.4. Such graphical elements 71, 72 may include any graphical control element, such as a cursor, a mouse pointer, a widget, or any graphical image (e.g., an icon representing any object), with which the user interacts or through which the user interacts with other graphical elements displayed by the GUI, as in embodiments discussed later. In the example of FIG.4, the elements 71, 72 include a cursor (or mouse pointer) 71 and a virtual object 72. A typical application scenario of the present methods in one in which the user wants to select the object 72 through the cursor 71, through touchless interactions. To that aim, the method comprises instructing a camera 2 to repeatedly acquire S20 images of a user 5. Images acquired by the camera 2 are then exploited to enable contactless user actions with the GUI, as illustrated in the diagram of FIG. 1. To this aim, the method instructs to perform a series of steps, for each image of at least some of the images acquired. That is, not all images may be exploited; some images may be dropped (S25, S35), for reasons discussed later. The following describes the series of steps performed to enable the contactless user actions. P187721PC00 8 As seen in the flow of FIG.7, such steps include executing S30 a pose tracking algorithm. Such an algorithm is known per se: it uses machine learning techniques to infer keypoints on the user’s body. The aim is to identify, and then update, coordinates of keypoints 51 – 54 of the user 5. The keypoints correspond to, i.e., are associated with, defined locations on the user body. Importantly, the identified keypoints notably include a target keypoint 53, which is used to control a GUI element 71, as explained below. Such keypoints are defined in a base reference frame FB corresponding to each image. The base reference frame FB of a given image is schematically depicted in FIG.3. The reference frame FB is distinct from the reference frame FU of the user, as also seen in FIG.3. A base reference frame is a frame corresponding to each image; it defines the keypoint space. Conversely, the user reference frame FU can be regarded as a frame that is <attached= to the user's body, see FIG. 3. A further reference frame FS is shown in FIG.4, which corresponds to the frame subtended by the display screen of the display device 4. Next, the method computes S50 anthropometric data from the updated coordinates of the keypoints. Interestingly, such anthropometric data capture a relative size of an anthropometric feature 6 of the user 5 in each image considered. That is, this relative size is an apparent dimension of the anthropometric feature 6, as seen by the camera. Note, the apparent size of the anthropometric feature is defined in the base reference frame FB. More precisely, the size of the anthropometric feature is computed based on coordinates of at least two of the keypoints, as defined in the base reference frame FB of each image. This anthropometric feature is an anthropometric characteristic, which can notably be an ear-to-ear distance, an eye-to-eye distance, a shoulder-to-shoulder distance, a torso area, a leg length, etc. In other words, this anthropometric feature corresponds to a physical characteristic of the user. In the example of FIGS.2 and 3, the anthropometric feature 6 corresponds to the ear-to-ear distance 45, which is calculated from the coordinates of selected keypoints 51, as defined in the base reference frame FB. The method further applies S40, S60 a coordinate transformation to the updated coordinates of the keypoints. The aim is to obtain transformed coordinates of the keypoints in the reference frame FU of the user 5. In turn, the method rescales S70 the transformed coordinates of the keypoints (i.e., the coordinates as now defined in the reference frame FU) according to the relative size of said anthropometric feature, i.e., in accordance with said anthropometric data. All keypoints of interest can be rescaled, including the target keypoint 53. This step is pivotal, P187721PC00 9 as it allows to compensate for changes in the user’s distance to the camera, as further discussed later. Having done so, the method can now update S80 a position of the graphical element 71 in the reference frame FS of the display according to the rescaled coordinates of the target keypoint 53. Once the position of the graphical element 71 has been updated S80, the method sends S90 a signal encoding the updated position to the display device 4 to accordingly move the graphical element 71 as displayed in the display screen of the display device 4. The same operations are repeated for each image of said at least some of the images acquired by the camera 2. This way, a touchless user interface control is enabled, which does not require any physical contact between the user 5 and any physical device. Interestingly, changes in the relative size (i.e., the apparent dimension) of the anthropometric feature 6 of the user are used as a proxy to compensate for user translations along the camera axis. Additional user movements (e.g., rotations and translation parallel to the camera plane) can possibly be compensated, as in embodiments discussed later. The proposed method can be regarded as a real-time gesture tracking method, though not limited to hand, or head movements. Unlike other markerless systems relying on depth cameras or inertial measurement units (IMUs), here the method can typically rely on a conventional camera 2, such as a basic RGB camera. That is, the proposed method does not require any specific equipment. Rather, it can typically be performed using a standard computer (e.g., a laptop) a smartphone, or a tablet equipped with a standard camera. As such, the proposed method is particularly well suited to games and, more specifically, to digital health applications. An illustration of a digital health application is a scenario in which a person engages in home-based physical therapy. In this scenario, a software program is run on a computer equipped with an RGB webcam, so as to implement a method as described above, with a view to providing guidance and motivation during the performance of physical therapy exercises. To be visible to the camera, the person typically needs to stand at a distance from the computer, making it inconvenient to use traditional input devices like a keyboard or mouse. In such situations, the proposed method enables the user to interact in a contactless manner with the software and the computer, eliminating the need to physically touch any device to interact with the software. For example, this person may come to pause and resume a session, pause and resume playback of a video, adjust the sound volume, or exit the application. P187721PC00 10 Pose tracking algorithms mostly assume images obtained from RGB cameras. Thus, various types of pose tracking algorithms (which are known per se) may be contemplated for use in the present context. A pose tracking algorithm makes it possible to extract keypoints 51 – 54 using machine learning methods, which are also know per se. Examples of suitable pose tracking algorithms include the so-called BlazePose GHUM
Figure imgf000012_0001
(https://arxiv.org/abs/2210.06551), D3DP
Figure imgf000012_0002
, MixSTE (https://arxiv.org/abs/2203.00859), U- CondDGConv@GT_2D_Pose (https://arxiv.org/pdf/2107.07797v2.pdf), and UGCN (https://arxiv.org/pdf/2004.13985v1.pdf) algorithms. Alternatively, an off-the-shelve algorithm may be devised, which is specifically adapted for the present purpose, as the present inventors did. E.g., the pose tracking algorithm may restrict to the extraction of a few, predetermined keypoints 51, 52, 52r, 53. As usual in gesture tracking, the keypoints 51 – 54 map <fixed points= of the user body 5; such points are fixed in the user reference frame FU but moves in the base reference frame FB. The keypoints 51 – 54 are spatial locations points of particular interest in each image. Thus, the user keypoints 51 – 54 can be regarded as <semantic key points=, to the extent they are associated with defined points of the user body, corresponding to defined locations, such as the right shoulder, left shoulder, left hip, etc. Such locations are invariant to image transformations (e.g., rotation, translation, distortion, etc.). The target keypoint 53 is a selected point on the body; this keypoint 53 is used to guide the graphical element 71 (e.g., a cursor, as assumed in the following) on the display screen. That is, movement of this point 53 in the physical (i.e., real) world results in movement of the cursor 71 on the display screen. Outputs from the pose tracking algorithm are exploited in such a manner that the target keypoint 53 remains the same across successive images; it always corresponds to the same location on the user body. Note, a <reference frame= (also called <frame of reference=, or <frame= for short) refers to an abstract coordinate system, whose origin, orientation, and scale, are specified by reference points. However, unlike an abstract coordinate system, a reference frame is associated with a given thing (e.g., an image, the user body, etc.), to which a coordinate system is attached to. As a result, it is possible to assign a location and time to any event in each reference frame. In practice, however, the terminologies <reference frame=, <frame=, and <coordinate system=, are used interchangeably. In the present context, the <base reference frame= FB refers to an image acquired by the camera 2 (each image considered gives rise to a respective frame FB), the P187721PC00 11 <reference frame of the user= FU (or <user reference frame=) relates to the user, and the reference frame FS of the display refers to the screen plane of the screen of the display device 4. Each frame can be defined in dimension n (e.g., n = 2 or 3), which requires n + 1 reference points. E.g., when using Cartesian coordinates, a reference frame is defined by a reference point at the origin and a reference point at a unit distance along each of the n coordinate axes. While the dimension n is normally equal to 2 or 3, certain widgets such as scroll bars may occasionally require 1D movements only. In the present context, each reference frame is preferably defined in accordance with a Cartesian coordinate system, as usual in the field. The coordinate transformation performed at step S60 is a reference frame transformation. By definition, in the present context, the computerized system 1 has no information about the absolute distance of the user from the camera 2. Because the scheme used to compute the coordinates in the base reference frame FB is agnostic to the true, absolute distance of the user from the camera, the present method relies on the change in the relative size (i.e., the apparent dimension) of the anthropometric feature 6 of the user in the base reference frame to compensate for changes in the user’s distance to the camera 2. I.e., such changes are used as a proxy to compensate for user translations along the camera axis, i.e., the axis extending along a direction that is perpendicular to the camera plane. By definition, this anthropometric feature 6 has an essentially constant size in the reference frame FU of the user. Thus, the movement of the displayed graphical element 71 is made independent of a possible translation of the user in a direction perpendicular to the camera plane. As a result, a movement of the target keypoint 53 results in a consistent movement of the graphical element 71 as displayed by the display device 4, irrespective of the distance of the user from the camera 2. This makes it possible to efficiently track user gestures and, thus, enable a realistic, contactless user interface, irrespective of changes in the position of the user perpendicularly to the camera plane (i.e., movements along the camera axis). The transformed coordinates are preferably rescaled by applying a scaling function. Various scaling functions may be contemplated. In general, such functions can be defined as an algorithmic procedure, which involves a numerical or analytical calculation. Preferably, this function is defined analytically, to speed up calculations. For example, the scaling function used may be a rational polynomial function or a simple polynomial function f. This function may notably involve one or more constants c1, c2, …, d0, P187721PC00 12 s0, where d0 (respectively s0) corresponds to an initial relative length (respectively an initial relative area) of a constant (i.e., rigid) anthropometric feature 6, determined in accordance with at least two (respectively at least three) selected points of the keypoints 51 – 54. Two points define a line, while three coplanar points define a boundary of an area. For example, when using a 1D anthropometric feature of length d, the function may be of the form f(d) = (c1 + c2 d0)/(c1 + c2 d) or f = c1 + c2 d0/d. When using a 2D anthropometric feature of area s, the function may be of the form f(s) = (c1 + c2 s0)/(c1 + c2 s) or f(s) = c1 + c2 s0/s. More sophisticated approaches can be contemplated, which combine s and d variables, such as f(d, s) = (c1 + c2 d0 + c3 √s0)/(c1 + c2 d + c3 √s) or f(d, s) = c1 + c2 d0/d + c3 √s0/s. The variables d and s are updated for each image. In all of the above examples, the anthropometric data capture a relative size of a geometric element, which is bounded by at least two keypoints 51, selected from the identified keypoints 51 – 54, see FIGS.2 and 3. And the transformed coordinates of the keypoints are rescaled S70 according to a scaling function, which is preferably defined analytically. As illustrated above, the scaling function may take as argument a ratio of a reference size to the relative size of the anthropometric feature 6, as computed in the base reference frame FB for each image. For example, one may rely on a simple scaling factor of the form d0/d. Note, the target keypoint 53 is preferably distinct from the two keypoints bounding the line segment of length d. As said, the two keypoints 51 used to compute d correspond to ears in the example of FIGS.2 and 3. I.e., the relative size of the anthropometric feature is a relative length 45 of the line segment bounded by the two selected keypoints 51, such that the ratio d0/d is a ratio of a reference length d0 to the length d as computed for each image considered. More generally, though, these two keypoints may also correspond to eyes, iliac crests, greater trochanters, etc., and accordingly bound a rigid line segment of the user body. The length d is essentially constant in the user reference frame, subject to small (i.e., negligible) variations. In other words, said length is a distance between two keypoints on the user body that are essentially rigidly connected. Using an inverse function of the form d0/d reflects the optical reality as seen by the camera 2 well, inasmuch as the apparent size of an object is inversely proportional to the distance to the camera, owing to the perspective. That is, considering the laws of optics and perspective, an inverse function of the form d0/d will accurately compensate for perpendicular translations. P187721PC00 13 In embodiments, the reference size d0 corresponds to a value of the size d as computed for an initial image of the images repeatedly acquired by the camera 2. For example, d0 is the length of the reference distance as measured at time t = 0 upon camera setup, and d is the same distance as measured at the current time point. Note, d is typically updated at a rate of 15 to 20 fps, as further discussed below. A convenient alternative is to fix d0, instead of initially measuring it. The length d0 may actually be any predefined length. For example, d may be measured relative to a fixed length of each image, e.g., the width or height of each image considered, or the length of its diagonal (i.e., the <size= of the screen). If necessary, a correction is applied to each image to compensate for an optical distortion (by the camera 2) of the updated coordinates of the keypoints, as obtained in the base reference frame FB, . The distortion can for instance be described as follows: xd = xu1 + k1 r2 + k2 r4 + k3 r6 + ..., and yd = yu1 + k1 r2 + k2 r4 + k3 r6 + ..., where xd, yd are the distorted coordinates, xu, yu are the undistorted coordinates, r is the distance from the optical centre (the origin of the camera axis on the camera plane), and k1, k2, k3, are the radial distortion coefficients. By determining the distortion coefficients, it is possible to invert the distortion equation to undistort a full image. A more efficient alternative is to undistort coordinates of the sole keypoints of interest. Besides, tangential distortion occurs when the camera's lens is not perfectly aligned with the image sensor. It can be modelled using additional parameters. I.e., xd = xu + (2 p1 x y + p2 r2 + 2 x2), and yd = yu + (p1 r2 + 2 y2 + 2 p2 x y), where xd, yd are the distorted coordinates, xu, yu are the undistorted coordinates, x, y are the coordinates after correcting for radial distortion, and p1, p2 are the tangential distortion coefficients. So, applying the inverse of the above distortion equations makes it possible to correct tangential distortion. In each case, corrections can be applied to the sole keypoints of interest instead of the full image. In variants, though, such corrections can also be systematically applied to full images, as usual in the art. In further variants, the scaling function is devised to compensate for the optical distortion of the camera 2, at least partly, whereby the correction is performed S70 upon rescaling. That is, P187721PC00 14 the scaling function can be purposely altered to compensate for the optical distortion of the hardware camera. E.g., this function can for instance be devised as a series of the form f(d) = c1 + c2 d0/d + c3 (d0/d)2 + …, still taking d0/d as argument. Alternatively, the function can again be devised as a rational polynomial function. A polynomial form is preferred, given that radial distortion can be modelled using polynomial equations. In each case, though, the parameters of the function can be adjusted to partly compensate for optical distortion. The camera 2 is configured to repeatedly acquire S20 images at a given frame rate R1, which is usually fixed. Still, the pose tracking algorithm can be instructed to execute S30 for each image of only a subset of the repeatedly acquired images. That is, the pose tracking algorithm is executed S30 at an average rate R2 that is strictly less than R1 (R1 < R2). Preferably, R2 is chosen so that R1/2 ≤ R2 ≤ 2 R1/3. That is, the refresh rate of the relative size of the anthropometric feature (e.g., the length d) is lowered to allow more time for the pose tracking algorithm to perform inferences, i.e., to infer the semantic keypoints. For example, the actual frame rate of the camera will typically be equal to 30 fps, whereas the preferred rate for refreshing d is preferably set between 15 and 20 fps. The Nyquist sampling theorem advocates using a sampling frequency that is at least twice the highest frequency of interest. E.g., if a minimal frequency of 5 Hz is needed to adequately track human gestures, then the relative size d would have to be refreshed at a rate of 10 fps, at least. Note, it is certainly tricky to define the highest frequency of interest for human activities, inasmuch as this frequency depends on the desired application. However, considering usual human activities (such as walking, for example), typical frequencies are between 0 and 20 Hz, and 98% of the FFT amplitude is already contained below 10 Hz. Thus, one may for instance consider the Nyquist/folding frequency to be on the order of 8 – 10 Hz, which would advocate using 16 – 20 Hz as a sampling rate. That is, if R0 is a maximal user interaction frequency of interest for the problem at issue, one may further impose R2 ≥ 2 R0, where R0 can be regarded as the Nyquist or folding frequency for the problem at hand. As noted earlier, a coordinate transformation is applied (at step S60, FIG. 7) to the updated coordinates of the keypoints to obtain transformed coordinates of the keypoints in the user reference frame FU. This coordinate transformation is a transformation between the base reference frame FB and the user reference frame FU. Interestingly, this transformation can be simply determined based on the updated coordinates (in the base reference frame FB) of keypoints 52, 52r selected from the keypoints 51 – 54. The selected keypoints 52, 52r include P187721PC00 15 three non-collinear keypoints, which define a reference plane of the user, as shown in FIGS.2 and 3. Thus, in embodiments, the method further comprises determining S40, for each image, the coordinate transformation between the frame FB and frame FU based on the updated coordinates of the selected keypoints 52, 52r, as computed in the frame FB. The coordinate transformation is then applied S60 for the reference frame FB. Note, unit vectors can be defined for the user reference frame FU, from the selected keypoints 52, 52r. One of these keypoints defines the origin of the user reference frame FU. A similar transformation is determined and then applied S60 for each new reference frame FB (i.e., for each new image considered). So, in practice, for each new image considered, the transformation (e.g., a translation and a rotation) is determined S40 and applied S60. Then, one rescales S70 the keypoint coordinates according to the anthropometric feature size, and subsequently map S80 the target keypoint to a position on the display screen. As noted above, the frame FU (attached to the user's body) can be determined from the 3D coordinates of selected keypoints, as defined in the frame FB. The selected keypoints 52, 52r define axes x', y', z' of the frame FU, as illustrated in FIG. 3. Corresponding unit vectors are defined from the axes, given an origin 52r. Note, x', y', z' denote both the axes of the frame FU and the corresponding unit vectors in FIGS.2 and 3. Similarly, x, y, z denote both the axes of the frame FB and the corresponding unit vectors in FIG.3. In practice, such points 52, 52r can easily be determined thanks to outputs from the pose tracking algorithm. For example, see FIG. 3, if the frame FU is centred on keypoint 52r, corresponding to the left shoulder, y' can be taken as the axis that is parallel to the shoulder-to-shoulder axis, i.e., the axis extending through the keypoints corresponding to the right shoulder and the left shoulder. Conversely, the axis x' can be taken as a vector parallel to the shoulder-to-hip axis. Thus, the remaining axis/unit vector z' is perpendicular to the plane spanned by x' and y'. So, a suitable keypoint selection makes it possible to determine the coordinates of the three unit vectors x', y', z' in the frame FB. Such unit vectors can then be used to determine the Euler angles between the frames FB and FU. As evoked earlier, the coordinate transformation determined at step S40 preferably combines a rotation and a translation. And, as noted above, the rotation can be defined based on Euler angles between the base reference frame FB and the reference frame of the user 5. In addition, the translation can be defined based on coordinates of a reference keypoint in the base reference frame FB. A reference plane (x', y') is defined by three non-collinear keypoints 52, 52r, which P187721PC00 16 are selected from the keypoints 51 – 54, see FIG. 2. The reference keypoint 52r is selected among the keypoints 52 and corresponds to the origin of the user reference frame. I.e., the reference keypoint 52r and the reference plane (x', y') are used to define the user reference frame FU, which translates and rotates together with the user 5. So, the user reference frame FU has its origin at the reference keypoint 52r, and its XY plane coincides with the reference plane As a result, the movement of the cursor 71 is isolated (i.e., made independent) from any rotation of the user around the human longitudinal axis (i.e., cephalocaudal axis) when standing upright in the real-world space. That is, a movement of the target keypoint 53 results in a consistent movement of the cursor 71, independently of any rotation of the user 5 in front of the camera 2. Similarly, the movement of the cursor is isolated from any translation of the user parallel to the camera plane. Rigid translations of the whole body in the XY plane do not result in undesired movements of the cursor 71. This improves user interactions and the user experience. In more details, the Euler angles {³, ´, µ} between two coordinate systems (say ^^^ and x'y'z') are derived from the rotation matrix between the two coordinate systems, knowing that: ³ = atan2(R(1, 0)/cos(´), R(0, 0)/cos(´)), ´ = atan2(–R(2, 0), sqrt(R(0, 0)2 + R(1, 0)2)), and µ = atan2(R(2, 1)/cos(´), R(2, 2)/cos(´)), where R is the rotation matrix and R(i, j) denotes the value in row i and column j. The above equations assume a <ZYX= Euler angle convention (yaw, pitch, and roll). Now, to compute the rotation matrix R that transforms one coordinate system to another, information about the basis vectors of the two coordinate systems is needed. The basis vectors of the xyz coordinate system can be noted e1 = (e1x, e1y, e1z), e2 = (e2x, e2y, e2z), e3 = (e3x, e3y, e3z). In the present context, these may always be set to e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1). Similarly, for the second x'y'z' coordinate system, the basis vectors can be written e1' = (e1'x, e1'y, e1'z), e2' = (e2'x, e2'y, e2'z), and e3' = (e3'x, e3'y, e3'z). These are uniquely defined by the choice of the non-collinear keypoints chosen to define the second reference system. Next, we can normalize the basis vectors, to ensure they are unit vectors. I.e., if such vectors are not unit vectors, they are divided by the square root of their respective length (i.e., their P187721PC00 17 norm). We subsequently compute the rotation matrix R that transforms vectors from the xyz coordinate system to the x'y'z' coordinate system as follows: R = [e1' e2' e3'] × [e1 e2 e3]T Here, [e1' e2' e3'] is a matrix whose columns are the basis vectors of the x'y'z' coordinate system, [e1 e2 e3] is a matrix whose columns are the basis vectors of the xyz coordinate system, and T denotes the transpose operation. The vectors e1’, e2’, and e3’ are orthonormal. In order to avoid gimbal locks, a different convention can be used for the Euler order of rotation, or by using a Quaternion representation of the rotation. Interestingly, the vectors and matrices involved can be padded so as to allow a single vector matrix multiplication. Thus, in embodiments, the coordinate transformation is defined and applied S60 as a single matrix multiplication, as discussed in detail in Section 2. This repetitive operation can advantageously be offloaded to a GPU, if any. As illustrated in FIG.4, the GUI 7 may display one or more additional graphical elements 72 on the display device 4, beside the first graphical element 71. Now, various algorithmic recipes can be applied to allow the user to perform an action on the second graphical element 72, thanks to control exerted through the first element 71. In particular, referring to FIG. 8, the method may further comprise running S130 – S140 a monitoring algorithm to detect (S130: Yes; S140: Yes) a potential action to be performed S150 on the second graphical element 72 based on a relative position of the first graphical element 71 and the second graphical element 72 in the reference frame FS of the display. Said action may for instance be a mere selection of the element 72, trigger an action (e.g., start execution), or display a drop-down menu relating to the element 72. The monitoring algorithm is executed concurrently with the main flow, see step S110 in FIG. 8. It may notably involve a computer vision algorithm and/or a timer triggering the desired action. A computer vision algorithm may be used to detect a particular gesture of the user 5, which gesture will trigger the action. Alternatively, or in addition, a timer may trigger this action based on a time duration during which a position of the first graphical element 71 coincides with a position of the second graphical element 71. Note, the positions of the first and second graphical elements 71, 72 may be considered to coincide if one of the elements (e.g., a mouse pointer) overlaps with the second (e.g., an icon). P187721PC00 18 If the action performed consists of a selection of the second graphical element 72, then subsequent steps of the method are henceforth performed in respect of the element 72. That is, upon detecting a selection of the element 72, the method instructs, for each image subsequently obtained (S20, S25) from the camera 2, to execute S30 the pose tracking algorithm and rescale S70 the subsequently transformed coordinates with a view to updating S80, S90 a position of the second graphical element 72, as so far done in respect of the first graphical element 71. Referring now to FIG. 5, attractors can advantageously be used to ease manipulations by the user. In particular, the position of the first graphical element 71 can be updated S80 according to an attractor field assigned to the second graphical element 72. The update step S80 additionally makes use of the rescaled coordinates of the target keypoint 53, as explained earlier. Various types of attractors can be contemplated, which are known per se. However, it is preferred to rely on attractor field that is devised so as to be minimal at the centre of the second graphical element 72 and beyond an attraction field boundary surrounding the second graphical element 72 and be maximal at an intermediate distance between the centre of the second graphical element 72 and the attraction field boundary, as illustrated in FIG.5. Use can for instance be made of a cosine function, which is maximal at the half distance to the centre of the second element 72, see FIG. 6. The function actually depicted is ^ ^ (1 − Cos[2^^]), where u corresponds to the normalized radial distance (x-axis), while the value of the function reflects the attraction magnitude (y-axis). That is, this function is used to model the intensity of the attractor along a direction of a vector ^ extending from the centre of a displayed graphical element to the position of the cursor 71, as discussed in detail in section 2. FIG.7 shows a preferred flow. The GUI starts running at step S10. A loop is started at step S20, whereby the camera repeatedly acquires images. Only a subset of the images produced are fed S30 to the pose tracking algorithm, for it to identify (first time) and then update keypoint coordinates. I.e., some of the images are dropped (S25: No, S35). The coordinate transformation is determined at step S40 and then applied (step S60) to the updated coordinates of the keypoints, so as to obtain transformed coordinates of the keypoints in the user reference frame. In parallel to steps S40 – S60, anthropometric data are computed at step S50; the transformed coordinates of the keypoints are rescaled at step S70, according to anthropometric data. The position of a graphical element in the display reference frame is accordingly updated at step S80 and a corresponding signal is sent S90 to the display device to move the graphical element displayed. P187721PC00 19 As seen in FIG. 8, a monitoring algorithm can be run in parallel to the flow of FIG. 7 (this corresponding to Flow #1, step S110 in FIG. 8). The relative position of the first and second graphical elements (see FIG.4) is monitored at step S120, which is continually performed. If the positions of the two elements is determined to match (S130: Yes), a computer vision algorithm is run, and/or a timer is triggered. Next, the method checks S140 whether a certain criterion is met. I.e., is the computer vision algorithm able to identify a predetermined gesture? Has a predefined time period elapsed? If so (S140: Yes), an action (e.g., selection, start execution, display drop-down menu, etc.) is triggered at step S150 in respect of the second graphical element. Note, the algorithm keeps on monitoring S120 the relative position of the graphical elements, irrespective of the outcomes of steps S130 and S140. Next, according to another aspect, the invention can be embodied as a computerized system 1 for enabling a contactless user interface. The computerized system 1 comprises a camera 2, a display device 4, and processing means 230. The system 1 may be a desktop computer 3, a smartphone, a tablet, a laptop, etc. In all cases, the processing means 230 are configured to execute S10 a GUI 7, instruct the camera 2 to repeatedly acquire S20 images of a user 5, and perform steps (see, e.g., steps S25 – S90 in FIG. 7) as described earlier in reference to the present methods. The processing means may for instance be adequately configured by loading software (e.g., as initially stored in a storage device 255) in the main memory of the system 1 and executing the corresponding instructions by the processing means 230. Closely related, a final aspect of the invention concerns a computer program product for enabling a contactless user interface. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by processing means 230 of a computerized system 1 as described above, to cause the computerized system 1 to take steps according to the present methods. The program instructions may typically embody several software modules, as depicted in FIG. 10. That is, in addition to physical components (camera 2, display device 4), the processing means 230 may be configured to implement several software modules 231 – 238. For example, a first module 231 is run to execute the pose tracking algorithm, which causes the coordinate update module 232 to update the coordinates of the keypoints in the base reference frame FB. The module 233 is run to compute anthropometric data from the updated coordinates. The P187721PC00 20 module 234 computes and applies the coordinate transformation to obtain transformed coordinates of the keypoints in the user reference frame FU. The transformed coordinates are then rescaled (according to the anthropometric data) by the module 235, whereby a tracking module 236 can update positions of the graphical elements to be displayed in the display reference frame FS in accordance with the rescaled coordinates of the target keypoint. Additional modules 237 – 238 may be involved. For example, an optical distortion module 237 may be run to correct keypoint coordinates. Moreover, a monitoring module 238 may be executed, whereby a computer vision module 238 may be run to identify specific user gestures. Alternatively, or in addition to the computer vision module, this module 238 may involve a timer to trigger an action in respect of a graphical element displayed, as explained earlier. The signals consumed and produced by the various modules 231 – 238 and components 2, 4 transit trough input/output (I/O) management units 260 – 270, see also FIG. 9. In particular, the updated positions of the graphical elements can be sent to the display device 4, for it to accordingly display movements of elements manipulated by the user, in operation. Additional features of the present computerized systems 1 and computer program products are described in section 3. The above embodiments have been succinctly described in reference to the accompanying drawings and may accommodate a number of variants. Several combinations of the above features may be contemplated. Examples are given in the next section. 2. Particularly preferred embodiments The following describes particularly preferred embodiments, which rely on algorithms that allow users to interact with a computer application through gestures captured by an RGB camera. Specifically, such gestures can be used to press <buttons= (i.e., icons) and move a cursor on the display screen. Such algorithms allow a mode of user interactions that does not require physical contact with any device, whether the main device on which the method primarily executes or a peripheral device. Thus, no peripheral device is needed. Instead, the main device (e.g., unit 200 in FIG. 9) captures the user’s movements through an RGB camera 2. A computer vision algorithm processes the camera’s images in real-time, identifying and tracking a set of keypoints 51 – 54 on the body, such as the hands, wrists, shoulders, hips, knees, ankles, and feet. The position of a selected keypoint (i.e., target keypoint 53) is mapped to the position of a cursor 71 on the display screen, which the user can use to interact with the computer P187721PC00 21 application. The proposed allows an intuitive use of the cursor 71 through the following innovations: • The movement of the cursor is independent of the rotation of the user in space; • The movement of the cursor is independent from translations of the user parallel to the camera plane; and • The movement of the cursor is independent from translations of the user in the direction perpendicular to the camera plane. When the position of the cursor overlaps with that of a button or any other interactive element on the display screen, the user can interact with it in two ways: (a) by performing a specific hand gesture (e.g., opening and closing the hand, which would correspond to a mouse click); or (b) by overlapping the cursor’s position with that of the button for a predetermined amount of time. The two mechanisms may possibly be concurrently implemented. As described in the previous section, keypoints correspond to actual body locations in the <world space,= i.e., the physical world in which the user moves. The target keypoint is a point on the body that guides the cursor on the display screen. Movement of this point in the real world results in movement of the cursor on the display screen. A reference keypoint and reference plane are used to define a reference frame that translates and rotates together with the user. The reference plane is defined by the three non-collinear keypoints 52, 52r, see FIG. 2. The reference system has its origin at the reference keypoint 52r, and its XY plane coincides with the reference plane. The reference distance corresponds to the anthropometric feature, i.e., a distance between two keypoints on the body that are rigidly connected (e.g., the ears, the eyes). 2D images from the camera 2 can be associated with a base reference frame FB. The keypoints are generated by the pose tracking algorithm. They consist of XYZ coordinates for a set of points on the human body, identified from the 2D image of the RGB camera 2. The XY coordinates of each keypoint are set between 0 and 1, this depending on the position of the keypoint in the image. The top left corner is [0, 0], and the bottom right corner is [1, 1]. The z- coordinate encodes information in the direction perpendicular to the camera frame. It is defined relative to the point at the centre of the hips and indicates whether another keypoint is in front of or behind the hips, using numbers between 0 and 1 that roughly correspond to the unit of the other axes x, y. Note, the z coordinate only provides relative information between the centre of P187721PC00 22 the hips and another keypoint. It does not capture information as to the absolute distance of the user from the camera. The display screen is associated with a further frame FS. The user can control the position of the cursor 71 by moving the target keypoint 53. The aim is to map the position of a target point 53 in the physical space to the position of the cursor 71 on the display screen and allow the user to interact with elements displayed on the display screen. The following description address the following problems: • The movement of the cursor should ideally be isolated from a rotation of the user in the physical space. I.e., a movement of the target keypoint should ideally result in a consistent movement of the cursor, independently of any rotation of the user with respect to the camera; • Furthermore, the movement of the cursor should ideally be isolated from translations of the user parallel to the camera plane. Rigid translations of the whole body in the XY direction should not result in moving of the displayed cursor 71; • The movement of the cursor should be isolated from the translation of the user along the camera axis, i.e., in the direction perpendicular to the camera plane. That is, a movement of the target keypoint should result in a consistent movement of the cursor, independently of the distance of the user from the camera plane; and • Lack of fine motor control, latency in the computation of the keypoints, and/or jittering of the keypoints, may prevent the user from guiding the pointer 71 towards a desired interactive element in a smooth and intuitive way. The following describes an adaptive mapping function to compensate for translations and rotations of the user. Referring to FIG.3, the proposed solution is to transform the coordinates of the target keypoint 53 from the base reference frame FB, defined by axes ^, ^, ^ (unit vectors of the frame FB are defined along said axes), to the user reference frame FU attached to the user’s body, and defined by axes ^^, ^^, ^′. Note, unit vectors of the frames FB are FU are defined along the axes ^, ^, ^ and ^^, ^^, ^′, whereby the frames FB are FU can be respectively referred to as the ^^^ and ^ ^ ^ ^ ^′ frames. The coordinate transformation sought can be defined as:
Figure imgf000024_0001
P187721PC00 23 where ^ is a 3 × 3 rotation matrix and ^ is the 3 × 1 translation vector. The matrix on the right-hand side represents the combined transformation. The rotation matrix ^ can be defined using a combination of rotations around the x-axis, y- axis, and z-axis. The general form of the rotation matrix is: ^ = ^^(^)^^(^)^^(^), where ^, ^, and ^, are the Euler angles between the ^^^ frame and the ^′^′^′ frame, and ^^, ^^, and ^^, are the rotation matrices. The rotation matrices around each axis can be defined as:
Figure imgf000025_0001
1 0 0 ^^(^) = ^0 cos(^) −sin(^) ^. 0 sin(^) cos(^) ^ is given by ^ = [^^ ^^ ^^], where ^^, ^^, and ^^ are the coordinates of the reference keypoint 52r in the ^^^ frame. This transformation ensures that the cursor’s position is not affected by translations or rotations of the user in space, but only by relative movements between the target keypoint and the reference keypoint. The following describes an adaptive mapping function to estimate the depth thanks to a rigid anthropometric feature, so as to isolate the cursor movements from any translation of the user along the camera axis. Because the z-coordinates in the frame FB do not indicate the absolute distance of the user from the camera, we may advantageously rely on the change in size of a rigid object as a proxy for the user’s distance from the camera. This object can for instance be chosen as the segment 45 (also called reference segment, see FIG.3). The x and y coordinates can be scaled using the following formula:
Figure imgf000025_0002
P187721PC00 24 where ^^ is the length of the reference distance measured at time ^ = 0 (upon camera setup), and ^ is the reference distance at the current time point. The distance ^ is regularly updated, at a rate of 15 to 20 fps. As a result, the target position is not affected by the distance of the user from the camera. Moreover, use can be made of attractors. Indeed, the lack of fine motor control, latency in the computation of the landmark positions, and/or jittering of the landmarks position computation, may prevent the user from guiding the pointer towards a desired element 72 in a smooth and intuitive way. A solution is to add <attractors= around the targets 72. Targets can be any elements on the display screen, such as buttons and widgets, which the user can interact with through the cursor 71. The added attractors may act as gravity fields, pulling the cursor towards the centre of the interactive element when the cursor is within an attraction distance and thus caught by the attraction field. Care should be taken to carefully design such attractors to prevent the user from accidentally pressing the wrong button or getting stuck in the element 72. A possibility is to design the attractors for them to act by adding an offset ^ to the position of the pointer ^, when the cursor is within the attraction distance. This yields a vector ^^ defined by: ^^ = ^ + ^. Use can advantageously be made of a <cosine velocity attractor= for the computation of ^, namely:
Figure imgf000026_0001
where ^ are the coordinates of the centre of the attractor (see the black dot at the centre of the shape 72 in FIG.4), ^ points to the pointer position (see the white dot in FIG.4), and ^^ points to the projection (see the patterned dot in FIG. 4) of ^ to the attraction field border 80. Note, the projection is actually defined as the extension of ^ to the boundary 80. In other words, ^^ points to a point where the boundary 80 is intersected by the axis passing through ^ and originating from the centre of the element 72. The constant C is adjusted to control the rate of attraction of the cursor 71 to the centre of graphical element 72. The behaviour of the above attractor can be described as follows. The cursor 71 is not attracted when located at the centre of the attractor (to prevent to get stuck at the shape 72) or at (or P187721PC00 25 beyond the boundary 80 of the attractor. However, the cursor 71 is attracted when located between the boundary 80 and the centre of the attractor. Note, this relation affects the velocity of the offset. A cosine-like function determines the amplitude of the attraction field, see FIG.5. The cosine function is equal to 0 at u = 0 and 1 (^ = 2^ |^ − ^|⁄ ^^^ − ^^ ), and equal to 1 at u = 0.5. Such values correspond to the centre of the attractor, the edge of the attractor, and the edge of the attraction field, i.e., the boundary 80, assuming the attraction field has a diameter that is twice the diameter of the attractor. Note, this function is clipped to 0 for u > 1 and undefined for u < 0. The accumulation of offset over multiple interactions with attractive elements may sometimes lead to the pointer drifting off the display screen. To counteract this, we may advantageously apply an exponential decay to the offset. That is, when the pointer is not within the attraction field of an attractor, one may use
Figure imgf000027_0001
where D is the decay constant (0 < D < 1). This ensures that the offset will go back to zero when the attraction is gone. The complete formulation of the offset value thus becomes:
Figure imgf000027_0002
3. Technical implementation details In embodiments, the method steps S10 – S150 described earlier in reference to FIGS.7 and 8, are implemented in (or triggered by) software, e.g., one or more executable programs, executed by processing means 230 of the computerized system 1, which may include one or more computerized units such as depicted in FIG. 9. Preferably, though, the system 1 consists of a single unit 200, such as a smartphone, a tablet, a laptop, or a desktop computer, which integrates all required components, starting with the camera 2, the display device 4, and processing means 230. Computerized devices and components can be suitably configured for implementing embodiments of the present invention as described herein. In that respect, it can be appreciated that the methods described herein are partly non-interactive, i.e., partly automated. Automated P187721PC00 26 parts of such methods can be implemented in software, hardware, or a combination thereof. In exemplary embodiments, automated parts of the methods described herein are implemented in software, as a service or an executable program (e.g., an application), the latter executed by suitable digital processing devices. The methods described herein shall typically be in the form of executable program, script, or, more generally, any form of executable instructions. As depicted in FIG.9, a typical computerized device (or unit) 200 may include a processor 230 and a memory 250 (possibly including several memory units) coupled to one or memory controllers 240. The processor 230 is a hardware device for executing software loaded in a main memory of the device. The processor can be any custom made or commercially available processor. The processor may notably be a central processing unit (CPU), as assumed in FIG. 9. Note, however, that some of the operations (in particular related to the pose tracking) may possibly be offloaded to a peripheral processing unit, such as a GPU, or remotely executed, e.g., by a server in data communication with the unit 200. The memory 250 of the unit 200 typically includes a combination of volatile memory elements (e.g., random access memory) and non-volatile memory elements, e.g., a solid-state device. The software in memory may include one or more separate programs, each of which comprises executable instructions for implementing functions as described herein. In the example of FIG. 9, the software in the memory includes methods described herein in accordance with exemplary embodiments and a suitable OS. The OS essentially controls the execution of other computer (application) programs and provides scheduling, input-output control, file and data management, memory management, and communication control and related services. It may further control the distribution of tasks to be performed by various processing units. The computerized unit 200 can further include a display controller 282 coupled to a display device 4. In exemplary embodiments, the computerized unit 200 further includes a network interface 290 or transceiver for coupling to a network (not shown). In addition, the computerized unit 200 will typically include one or more input and/or output (I/O) devices 2, 210, 220 (or peripherals, including the camera 2) that are communicatively coupled via a local I/O controller 260. A system bus 270 interfaces all components. Further, the local interface may include address, control, and/or data connections to enable appropriate communications among the aforementioned components. The I/O controller 135 may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to allow data communication. P187721PC00 27 When the computerized unit 200 is in operation, one or more processing units 230 execute software stored within the memory of the computerized unit 200, to communicate data to and from the memory 250 and/or the storage unit 255 (e.g., a hard drive and/or a solid-state memory), and to generally control operations pursuant to software instruction. The methods described herein and the OS, are read (in whole or in part) by the processing elements, typically buffered therein, and then executed. When the methods described herein are implemented in software, the methods can be stored on any computer readable medium for use by or in connection with any computer related system or method. Computer readable program instructions described herein can be downloaded to processing elements from a computer readable storage medium, via a network, for example, the Internet and/or a wireless network. A network adapter card or network interface 290 may receive computer readable program instructions from the network and forwards such instructions for storage in a computer readable storage medium 255 interfaced with the processing means 230. Aspects of the present invention are described herein notably with reference to a flowchart and a block diagram. It will be understood that each block, or combinations of blocks, of the flowchart and the block diagram can be implemented by computer readable program instructions. These computer readable program instructions may be provided to one or more processing elements 230 as described above, to produce a machine, such that the instructions, which execute via the one or more processing elements, create means for implementing the functions or acts specified in the blocks of the flowcharts of FIGS. 7, 8 and the block diagram of FIG. 10. These computer readable program instructions may also be stored in a computer readable storage medium. The flowchart and the block diagram in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of the computerized unit 200, methods of operating it, and computer program products, according to various embodiments of the present invention. Note that each computer-implemented block in the flowcharts or the block diagram may represent a (sub)module, or a set of instructions, which comprise(s) executable instructions for implementing the functions or acts specified therein. In variants, the functions or acts mentioned in the blocks may occur out of the order specified in the figures. For example, two blocks shown in succession may actually be executed in parallel, concurrently, or still in a reverse order, depending on the functions involved and the algorithm P187721PC00 28 optimization retained. It is also reminded that each block and combinations thereof can be adequately distributed among special purpose hardware components. While the present invention has been described with reference to a limited number of embodiments, variants, and the accompanying drawings, it will be understood by those skilled in the art that various changes may be made, and equivalents may be substituted without departing from the scope of the present invention. In particular, a feature (device-like or method-like) recited in a given embodiment, variant or shown in a drawing may be combined with or replace another feature in another embodiment, variant or drawing, without departing from the scope of the present invention. Various combinations of the features described in respect of any of the above embodiments or variants may accordingly be contemplated, that remain within the scope of the appended claims. In addition, many minor modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention is not limited to the particular embodiments disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims. In addition, many other variants than explicitly touched above can be contemplated. For example, other types of attractors may be relied on.

Claims

P187721PC00 29 CLAIMS 1. A computer-implemented method of enabling a contactless user interface, the method comprising: executing (S10) a graphical user interface (7) to display a graphical element (71) on a display device (4); instructing a camera (2) to repeatedly acquire (S20) images of a user (5); and instructing, for each image of at least some of the images acquired, to execute (S30) a pose tracking algorithm to update coordinates of keypoints (51 – 54) of the user (5) in a base reference frame (FB) corresponding to said each image, the keypoints including a target keypoint (53), compute (S50), from the updated coordinates, anthropometric data capturing a relative size of an anthropometric feature (6) of the user (5) in said each image, apply (S40, S60) a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (FU) of the user (5), rescale (S70) the transformed coordinates according to said anthropometric data, update (S80) a position of the graphical element (71) in a reference frame (FS) of the display according to the rescaled coordinates of the target keypoint (53), and send (S90) a signal encoding the updated position to the display device (4) to accordingly move the graphical element (71) displayed. 2. The computer-implemented method according to claim 1, wherein the anthropometric data computed includes a relative size of a geometric element bounded by at least two (51) selected ones of the keypoints (51 – 54), and the transformed coordinates are rescaled (S70) according to a scaling function taking as argument a ratio of a reference size to said relative size as computed in the base reference frame (FB) for said each image. P187721PC00 30 3. The computer-implemented method according to claim 2, wherein said relative size is a relative length of a line segment bounded by two selected ones of the keypoints (51), and said ratio is a ratio of a reference length to said length as computed for said each image. 4. The computer-implemented method according to claim 2 or 3, wherein the reference size corresponds to a value of said size as computed for an initial image of the repeatedly acquired images. 5. The computer-implemented method according to claim 2 or 3, wherein the reference size is determined in accordance with one or each of a width and a height of said each image. 6. The computer-implemented method according to any one of claims 1 to 5, wherein the method further comprises, for said each image, applying a correction to compensate for an optical distortion of the updated coordinates of the keypoints, in the base reference frame (FB), by the camera (2). 7. The computer-implemented method according to any one of claims 1 to 6, wherein the camera (2) is configured to repeatedly acquire (S20) said images at a given frame rate R1; the pose tracking algorithm is instructed to execute for each image of only a subset of the repeatedly acquired images, at an average rate R2 that is strictly less than R1; and, preferably, R1/2 ≤ R2 ≤ 2 R1/3. 8. The computer-implemented method according to any one of claims 1 to 7, wherein the method further comprises, for said each image, determining the coordinate transformation between the base reference frame (FB) and the reference frame (FU) of the user based on the updated coordinates of selected ones (52, 52r) of the keypoints (51 – 54) in the base reference frame (FB), and P187721PC00 31 the selected ones (52, 52r) of the keypoints include three non-collinear keypoints (52, 52r), which define the reference frame (FU) of the user (5). 9. The computer-implemented method according to claim 8, wherein the coordinate transformation applied (S60) combines a rotation and a translation, the rotation is defined based on Euler angles between said base reference frame (FB) and the reference frame of the user (5), and the translation is defined based on coordinates of a reference keypoint (52r) in the base reference frame (FB), wherein the reference keypoint (52r) is selected from the three non-collinear keypoints and defines an origin of the reference frame (FB) of the user (5). 10. The computer-implemented method according to claim 9, wherein the coordinate transformation is defined and applied (S60) as a single matrix multiplication, and the coordinate transformation is optionally applied (S60) by a graphics processor unit. 11. The computer-implemented method according to any one of claims 1 to 10, wherein said graphical element (71) is a first graphical element (71), the graphical user interface (7) is executed (S10) to display a second graphical element (72) on the display device (4). 12. The computer-implemented method according to claim 11, wherein the method further comprises running (S130 – S140) a monitoring algorithm to detect (S130: Yes; S140: Yes) a potential action to be performed (S150) on the second graphical element (72) based on a relative position, in the reference frame (FS) of the display, of the first graphical element (71) and the second graphical element (72). P187721PC00 32 13. The computer-implemented method according to claim 12, wherein the monitoring algorithm includes one or each of: a computer vision algorithm to detect a particular gesture of the user (5) triggering said action; and a timer triggering said action based on a time duration during which a position of the first graphical element (71) coincides with a position of the second graphical element. 14. The computer-implemented method according to claim 11 or 13, wherein said potential action is a selection of the second graphical element (72), and the method further comprises, upon detecting said potential action, instructing, for each image of at least some of the images subsequently acquired by the camera (2), to execute the pose tracking algorithm and rescale the subsequently transformed coordinates with a view to updating a position of the second graphical element (72), as previously done in respect of the first graphical element (71). 15. The computer-implemented method according to any one of claims 11 to 14, wherein the position of the first graphical element (71) is updated (S80) according to an attractor field of the second graphical element (72), in addition to said rescaled coordinates of the target keypoint (53). 16. The computer-implemented method according to claim 15, wherein said attractor field is devised so as to be minimal at a centre of the second graphical element (72) and beyond an attraction field boundary surrounding the second graphical element (72), and maximal at an intermediate distance between the centre of the second graphical element (72) and the attraction field boundary. 17. A computerized system for enabling a contactless user interface, wherein the computerized system comprises a camera (2), a display device (4), and processing means, the latter configured to: execute (S10) a graphical user interface (7) to display a graphical element (71) on the display device (4); instruct the camera (2) to repeatedly acquire (S20) images of a user (5); and P187721PC00 33 perform, for each image of at least some of the images acquired, each of the following steps: executing (S30) a pose tracking algorithm to update coordinates of keypoints (51 – 54) of the user (5) in a base reference frame (FB) corresponding to said each image, the keypoints including a target keypoint (53), computing (S50), from the updated coordinates, anthropometric data capturing a relative size of an anthropometric feature (6) of the user (5) in said each image, applying (S40, S60) a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (FU) of the user (5), rescaling (S70) the transformed coordinates according to said anthropometric data, updating (S80) a position of the graphical element (71) in a reference frame (FS) of the display according to the rescaled coordinates of the target keypoint (53), and sending (S90) a signal encoding the updated position to the display device (4) to accordingly move the graphical element (71) displayed. 18. A computer program product for enabling a contactless user interface, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by processing means of a computerized system, which further comprises a camera (2) and display device (4), to cause the computerized system to: execute (S10) a graphical user interface (7) to display a graphical element (71) on a display device (4); instruct a camera (2) to repeatedly acquire (S20) images of a user (5); and instruct, for each image of at least some of the images acquired, to execute (S30) a pose tracking algorithm to update coordinates of keypoints (51 – 54) of the user (5) in a base reference frame (FB) corresponding to said each image, the keypoints including a target keypoint (53), compute (S50), from the updated coordinates, anthropometric data capturing a relative size of an anthropometric feature (6) of the user (5) in said each image, P187721PC00 34 apply (S40, S60) a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (FU) of the user (5), rescale (S70) the transformed coordinates according to said anthropometric data, update (S80) a position of the graphical element (71) in a reference frame (FS) of the display according to the rescaled coordinates of the target keypoint (53), and send (S90) a signal encoding the updated position to the display device (4) to accordingly move the graphical element (71) displayed.
PCT/EP2023/066218 2023-06-16 2023-06-16 Enabling a contactless user interface for a computerized device equipped with a standard camera Pending WO2024256017A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2023/066218 WO2024256017A1 (en) 2023-06-16 2023-06-16 Enabling a contactless user interface for a computerized device equipped with a standard camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2023/066218 WO2024256017A1 (en) 2023-06-16 2023-06-16 Enabling a contactless user interface for a computerized device equipped with a standard camera

Publications (1)

Publication Number Publication Date
WO2024256017A1 true WO2024256017A1 (en) 2024-12-19

Family

ID=87003209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/066218 Pending WO2024256017A1 (en) 2023-06-16 2023-06-16 Enabling a contactless user interface for a computerized device equipped with a standard camera

Country Status (1)

Country Link
WO (1) WO2024256017A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110306421A1 (en) * 2010-06-11 2011-12-15 Namco Bandai Games Inc. Image generation system, image generation method, and information storage medium
US20120327125A1 (en) * 2011-06-23 2012-12-27 Omek Interactive, Ltd. System and method for close-range movement tracking
US20150116213A1 (en) * 2011-08-23 2015-04-30 Hitachi Maxell, Ltd. Input unit
US10782847B2 (en) * 2013-01-15 2020-09-22 Ultrahaptics IP Two Limited Dynamic user interactions for display control and scaling responsiveness of display objects

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110306421A1 (en) * 2010-06-11 2011-12-15 Namco Bandai Games Inc. Image generation system, image generation method, and information storage medium
US20120327125A1 (en) * 2011-06-23 2012-12-27 Omek Interactive, Ltd. System and method for close-range movement tracking
US20150116213A1 (en) * 2011-08-23 2015-04-30 Hitachi Maxell, Ltd. Input unit
US10782847B2 (en) * 2013-01-15 2020-09-22 Ultrahaptics IP Two Limited Dynamic user interactions for display control and scaling responsiveness of display objects

Similar Documents

Publication Publication Date Title
US11269481B2 (en) Dynamic user interactions for display control and measuring degree of completeness of user gestures
JP7213899B2 (en) Gaze-Based Interface for Augmented Reality Environments
EP2755194B1 (en) 3d virtual training system and method
CN102449577B (en) Virtual desktop coordinate transformation
KR101453815B1 (en) Device and method for providing user interface which recognizes a user&#39;s motion considering the user&#39;s viewpoint
Lu et al. Immersive manipulation of virtual objects through glove-based hand gesture interaction
US20130343607A1 (en) Method for touchless control of a device
JP7575160B2 (en) Method and system for selecting an object - Patent application
CN103443742A (en) Systems and methods for a gaze and gesture interface
CN107771309A (en) 3D user input
CN110215685B (en) Method, device, equipment and storage medium for controlling virtual object in game
KR20150040580A (en) virtual multi-touch interaction apparatus and method
US20160004315A1 (en) System and method of touch-free operation of a picture archiving and communication system
Placidi et al. Data integration by two-sensors in a LEAP-based Virtual Glove for human-system interaction
CN113658249B (en) Virtual reality scene rendering method, device, equipment and storage medium
EP3944228A1 (en) Electronic device, method for controlling electronic device, program, and storage medium
WO2024256017A1 (en) Enabling a contactless user interface for a computerized device equipped with a standard camera
CN107957781B (en) Information display method and device
US20140062997A1 (en) Proportional visual response to a relative motion of a cephalic member of a human subject
Liu et al. COMTIS: Customizable touchless interaction system for large screen visualization
JP7513262B2 (en) Terminal device, virtual object operation method, and virtual object operation program
KR20220108417A (en) Method of providing practical skill training using hmd hand tracking
US11604517B2 (en) Information processing device, information processing method for a gesture control user interface
Majewski et al. Providing visual support for selecting reactive elements in intelligent environments
US20160004318A1 (en) System and method of touch-free operation of a picture archiving and communication system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23734185

Country of ref document: EP

Kind code of ref document: A1