WO2024256017A1

WO2024256017A1 - Enabling a contactless user interface for a computerized device equipped with a standard camera

Info

Publication number: WO2024256017A1
Application number: PCT/EP2023/066218
Authority: WO
Inventors: Florian HAUFE; Michele XILOYANNIS; Tiberiu Ioan MUSAT
Original assignee: Akina Ag
Current assignee: Akina Ag
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2024-12-19
Anticipated expiration: 2025-12-16

Abstract

The invention is notably directed to a computer-implemented method of enabling a contactless user interface. The method comprises executing a graphical user interface to display a graphical element on a display device and instructing a camera to repeatedly acquire images of a user. The method further comprises instructing to perform the following steps, for each image of at least some of the images acquired. First, a pose tracking algorithm is executed to update coordinates of keypoints (51 – 54) of the user in a base reference frame corresponding to said each image. I.e., each image considered can be associated with a corresponding base frame (F _B ). The keypoints notably include a target keypoint (53), which will be used to guide the graphical element. Second, from the updated coordinates, the method computes anthropometric data, which capture a relative size of an anthropometric feature (45) of the user in said each image. The method further applies a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (F _U) of the user. The reference frame is "attached" to the user body. Next, the method rescales the transformed coordinates according to said anthropometric data. The method then updates a position of the graphical element in a reference frame of the display according to the rescaled coordinates of the target keypoint. Finally, it sends a signal encoding the updated position to the display device to accordingly move the graphical element displayed. The proposed method works with conventional RGB cameras (no depth camera is needed) and is particularly well suited to games and, more specifically, to digital health applications. The invention further concerns related computerized devices and computer program products.

Description

P187721PC00 1 ENABLING A CONTACTLESS USER INTERFACE FOR A COMPUTERIZED DEVICE EQUIPPED WITH A STANDARD CAMERA TECHNICAL FIELD The invention relates in general to the field of computer-implemented methods, computerized systems, and computer program products for enabling a contactless user interface, e.g., using a standard RGB camera. In particular, it is directed to methods relying on changes in the relative size (i.e., an apparent dimension) of a given anthropometric feature of the user to compensate for changes in the user’s distance to the camera, as well as other user movements including rotations and translations parallel to the camera plane. BACKGROUND Gesture recognition aims at interpreting human gestures (typically originating from the face or hand) through techniques of computer vision and image processing. Concepts of contactless (also called <touchless=) user interfaces have been proposed, which are based on gesture control. The goal is to allow a user to control a computer via body motion and gestures without physically touching any device, e.g., a keyboard, a mouse, a display screen, or any peripheral device. For example, kinetic user interfaces (KUIs) have been proposed, which allow users to interact with computing devices through the motion of objects and bodies. Such interfaces typically involve sensorised gloves, stereo cameras, and gesture-based controllers. A drawback of such approaches is that they require specific equipment. As a result, they are not suited for a widespread adoption of digital technology such as digital healthcare. In that respect, the concept of <therapy at home= is slow to take hold because patients lack guidance and motivation. Adding hardware constraints will only impede further the uptake of home therapy. Therefore, there is a need to enable a contactless user interface with simple equipment (i.e., laptop, tablet, or smartphone), which does not require additional devices, and is simple to use for patients. P187721PC00 2 SUMMARY According to a first aspect, the present invention is embodied as a computer-implemented method of enabling a contactless user interface. The method comprises executing a graphical user interface to display a graphical element on a display device and instructing a camera to repeatedly acquire images of a user. The method further comprises instructing to perform the following steps, for each image of at least some of the images acquired. First, a pose tracking algorithm is executed to update coordinates of keypoints of the user in a base reference frame corresponding to said each image. I.e., each image considered can be associated with a corresponding base frame. The keypoints notably include a target keypoint, which will be used to guide the graphical element. Second, from the updated coordinates, the method computes anthropometric data, which capture a relative size of an anthropometric feature of the user in said each image. The method further applies a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame of the user. The reference frame of the user can be regarded as a frame that is <attached= to the user body. Next, the method rescales the transformed coordinates according to said anthropometric data. The method then updates a position of the graphical element in a reference frame of the display according to the rescaled coordinates of the target keypoint. Finally, it sends a signal encoding the updated position to the display device to accordingly move the graphical element displayed. This way, a touchless user interface control is enabled, whereby the proposed method does not require any physical contact between the user and any physical device. Unlike other markerless systems relying on depth cameras or inertial measurement units, here the method can typically rely on images acquired by a conventional camera, e.g., a standard RGB camera. Still, changes in the relative size (i.e., the apparent dimension) of the anthropometric feature of the user are used as a proxy to compensate for user translations along the camera axis. Additional user movements (e.g., rotations and translation parallel to the camera plane) can possibly be compensated, as in embodiments discussed below. In embodiments, the computed anthropometric data includes a relative size of a geometric element, which is bounded by at least two selected ones of the keypoints. The transformed coordinates are simply rescaled according to a scaling function taking as argument a ratio of a reference size to said relative size as computed in the base reference frame for said each image. For example, using a ratio of the reference size to the relative size is efficiently and quickly computed, and reflects the optical reality (as seen by the camera) well, inasmuch as the size of an object is inversely proportional to the distance to the camera. Preferably, said relative size P187721PC00 3 is a relative length of a line segment bounded by two selected ones of the keypoints, and said ratio is a ratio of a reference length to said length as computed for said each image. In embodiments, the reference size corresponds to a value of said relative size as computed for an initial image of the repeatedly acquired images. In more straightforward efficient variants, the reference size is determined in accordance with one or each of a width and a height of said each image. In embodiments, the method further comprises, for said each image, applying a correction to compensate for an optical distortion (by the camera) of the updated coordinates of the keypoints in the base reference frame. In preferred embodiments, the camera is configured to repeatedly acquire said images at a given frame rate R1. The pose tracking algorithm is instructed to execute for each image of only a subset of the repeatedly acquired images, at an average rate R2 that is strictly less than R1. Preferably, R₁/2 ≤ R₂ ≤ 2 R₁/3. This way, more time is allowed for the pose tracking algorithm to infer the keypoints. In embodiments, the method further comprises, for said each image, determining the coordinate transformation between the base reference frame and the reference frame of the user based on the updated coordinates of selected ones of the keypoints of the user in the base reference frame. Said the selected ones of the keypoints include three non-collinear keypoints, which define the reference frame of the user. Preferably, the applied coordinate transformation combines a rotation and a translation, the rotation is defined based on Euler angles between said base reference frame and the reference frame of the user, and the translation is defined based on coordinates of a reference keypoint in the base reference frame. The reference keypoint corresponds to an origin of the reference frame of the user. It is selected from the three non-collinear keypoints and defines the origin of the reference frame of the user. The coordinate transformation may advantageously be defined and applied as a single matrix multiplication. Where possible, the single matrix multiplication is optionally offloaded to a graphics processor unit, in the interest of computational speed. In preferred embodiments, said graphical element is a first graphical element. Still, the execution of the graphical user interface may cause to display additional graphical elements on the display device, in particular a second graphical element. P187721PC00 4 Preferably, the method further comprises running a monitoring algorithm to detect a potential action to be performed on the second graphical element based on a relative position, in the reference frame of the display, of the first graphical element and the second graphical element. In embodiments, the monitoring algorithm includes one or each of: a computer vision algorithm to detect a particular gesture of the user triggering said action; and a timer triggering said action based on a time duration during which a position of the first graphical element coincides with a position of the second graphical element. Preferably, said potential action is a selection of the second graphical element, and the method further comprises, upon detecting said potential action, instructing, for each image of at least some of the images subsequently acquired by the camera, to execute the pose tracking algorithm and rescale the subsequently transformed coordinates with a view to updating a position of the second graphical element, as previously done in respect of the first graphical element. In preferred embodiments, the position of the first graphical element is updated according to an attractor field of the second graphical element, in addition to said rescaled coordinates of the target keypoint. Preferably, said attractor field is devised so as to be minimal at a centre of the second graphical element and beyond an attraction field boundary surrounding the second graphical element, and maximal at an intermediate distance between the centre of the second graphical element and the attraction field boundary. According to another aspect, the invention is embodied as a computerized system for enabling a contactless user interface, wherein the computerized system comprises a camera, a display device, and processing means. The processing means are configured to execute a graphical user interface to display a graphical element on the display device, instruct the camera to repeatedly acquire images of a user, and perform, for each image of at least some of the images acquired, each of the following steps. First, a pose tracking algorithm is executed to update coordinates of keypoints of the user in a base reference frame corresponding to said each image, the keypoints including a target keypoint. Second, anthropometric data are computed from the updated coordinates. The anthropometric data capture a relative size of an anthropometric feature of the user in said each image. Moreover, a coordinate transformation is applied to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame of the user. The transformed coordinates are subsequently rescaled according to said P187721PC00 5 anthropometric data. A position of the graphical element in a reference frame of the display is then updated according to the rescaled coordinates of the target keypoint, and a signal encoding the updated position is sent to the display device to accordingly move the graphical element displayed. A final aspect concerns a computer program product for enabling a contactless user interface. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by processing means of a computerized system, which further comprises a camera and display device, to cause the computerized system to perform steps as described above in respect of the present methods. BRIEF DESCRIPTION OF THE DRAWINGS These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings. The illustrations are for clarity in facilitating one skilled in the art in understanding the invention in conjunction with the detailed description. In the drawings: FIG.1 is a diagram illustrating interactions between a user, a camera, processing means, and a display device, to enable a contactless user interface, as in embodiments; FIG.2 shows a user in the physical world. Several virtual features are mapped onto the user by an algorithm, where such features include a reference plane, keypoints, and an anthropometric feature (a reference distance), as in embodiments; FIG. 3 schematically illustrates the mapped features in a base reference frame corresponding to an image acquired by a camera, as involved in embodiments. I.e., the features correspond to a user as seen by the camera; FIG.4 shows an application user interface run on a display device, where the application user interface displays two objects that can be controlled by a user in a contactless manner, as in embodiments; FIG. 5 schematically illustrates an attractor field set around a displayed graphical element to ease user interactions, as involved in embodiments; P187721PC00 6 FIG.6 is a plot of a cosine function, which is used to model the intensity of the attractor along a direction of a vector extending from the centre of a displayed graphical element to the position of a cursor, in embodiments; FIG. 7 is a flowchart illustrating high-level steps of a method of enabling a contactless user interface, according to embodiments; FIG.8 is a flowchart illustrating high-level steps of a monitoring algorithm, allowing a user to trigger an action in respect of a displayed graphical element, as in embodiments; FIG. 9 schematically represents a general-purpose computerized system, suited for implementing one or more method steps as involved in embodiments of the invention; FIG. 10 is a block diagram schematically illustrating selected components of, and modules implemented by, a computerized system for enabling a contactless user interface, according to in embodiments. The accompanying drawings show simplified representations of devices or parts thereof, as involved in embodiments. Similar or functionally similar elements in the figures have been allocated the same numeral references, unless otherwise indicated. Computerized systems, methods, and computer program products embodying the present invention will now be described, by way of non-limiting examples. DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION The following description is structured as follows. General embodiments and high-level variants are described in section 1. Section 2 addresses particularly preferred embodiments. Section 3 concerns technical implementation details. Note, the present method and its variants are collectively referred to as the <present methods=. All references Sn refer to methods steps of the flowcharts of FIGS.7 and 8, while numeral references pertain to devices, components, and concepts involved in embodiments of the present invention. P187721PC00 7 1. General embodiments and high-level variants A first aspect of the invention is now described in detail in reference to FIGS.1 – 4, and 7. This aspect concerns a computer-implemented method of enabling a contactless (or touchless) user interface. The method can for instance be performed by a computerized system 1, e.g., a computerized unit 200 (e.g., a smartphone, a tablet, a laptop) such as shown in FIG. 9. The system 1 comprises a processing unit 3, a camera 2, and a display device 4, as illustrated in FIG.1. Preferably, the computerized unit 200 integrates all the components (processing means, display device, and camera). In all cases, the steps of the method are executed, or triggered, by processing means, which are operatively connected to the camera 2 and the display device 4. However, some of these steps may possibly be offloaded to a paired processor or a connected device. E.g., such steps may be remotely performed at a server connected to a unit 200. The system 1 actually concerns another aspect, which is described later in detail. The method involves the execution S10 of a graphical user interface (GUI) 7. The GUI 7 may be executed as part of, or jointly with, an application or an operating system (OS) executing on the computerized system 1. The application may be any application, e.g., a game, a word processing application. Preferably, however, this application is a digital health application, such as a digital coaching application. As usual, executing the GUI causes to display one or more graphical elements 71, 72 on the display device 4, as illustrated in FIG.4. Such graphical elements 71, 72 may include any graphical control element, such as a cursor, a mouse pointer, a widget, or any graphical image (e.g., an icon representing any object), with which the user interacts or through which the user interacts with other graphical elements displayed by the GUI, as in embodiments discussed later. In the example of FIG.4, the elements 71, 72 include a cursor (or mouse pointer) 71 and a virtual object 72. A typical application scenario of the present methods in one in which the user wants to select the object 72 through the cursor 71, through touchless interactions. To that aim, the method comprises instructing a camera 2 to repeatedly acquire S20 images of a user 5. Images acquired by the camera 2 are then exploited to enable contactless user actions with the GUI, as illustrated in the diagram of FIG. 1. To this aim, the method instructs to perform a series of steps, for each image of at least some of the images acquired. That is, not all images may be exploited; some images may be dropped (S25, S35), for reasons discussed later. The following describes the series of steps performed to enable the contactless user actions. P187721PC00 8 As seen in the flow of FIG.7, such steps include executing S30 a pose tracking algorithm. Such an algorithm is known per se: it uses machine learning techniques to infer keypoints on the user’s body. The aim is to identify, and then update, coordinates of keypoints 51 – 54 of the user 5. The keypoints correspond to, i.e., are associated with, defined locations on the user body. Importantly, the identified keypoints notably include a target keypoint 53, which is used to control a GUI element 71, as explained below. Such keypoints are defined in a base reference frame FB corresponding to each image. The base reference frame FB of a given image is schematically depicted in FIG.3. The reference frame FB is distinct from the reference frame F_U of the user, as also seen in FIG.3. A base reference frame is a frame corresponding to each image; it defines the keypoint space. Conversely, the user reference frame FU can be regarded as a frame that is <attached= to the user's body, see FIG. 3. A further reference frame FS is shown in FIG.4, which corresponds to the frame subtended by the display screen of the display device 4. Next, the method computes S50 anthropometric data from the updated coordinates of the keypoints. Interestingly, such anthropometric data capture a relative size of an anthropometric feature 6 of the user 5 in each image considered. That is, this relative size is an apparent dimension of the anthropometric feature 6, as seen by the camera. Note, the apparent size of the anthropometric feature is defined in the base reference frame FB. More precisely, the size of the anthropometric feature is computed based on coordinates of at least two of the keypoints, as defined in the base reference frame F_B of each image. This anthropometric feature is an anthropometric characteristic, which can notably be an ear-to-ear distance, an eye-to-eye distance, a shoulder-to-shoulder distance, a torso area, a leg length, etc. In other words, this anthropometric feature corresponds to a physical characteristic of the user. In the example of FIGS.2 and 3, the anthropometric feature 6 corresponds to the ear-to-ear distance 45, which is calculated from the coordinates of selected keypoints 51, as defined in the base reference frame F_B. The method further applies S40, S60 a coordinate transformation to the updated coordinates of the keypoints. The aim is to obtain transformed coordinates of the keypoints in the reference frame F_U of the user 5. In turn, the method rescales S70 the transformed coordinates of the keypoints (i.e., the coordinates as now defined in the reference frame FU) according to the relative size of said anthropometric feature, i.e., in accordance with said anthropometric data. All keypoints of interest can be rescaled, including the target keypoint 53. This step is pivotal, P187721PC00 9 as it allows to compensate for changes in the user’s distance to the camera, as further discussed later. Having done so, the method can now update S80 a position of the graphical element 71 in the reference frame FS of the display according to the rescaled coordinates of the target keypoint 53. Once the position of the graphical element 71 has been updated S80, the method sends S90 a signal encoding the updated position to the display device 4 to accordingly move the graphical element 71 as displayed in the display screen of the display device 4. The same operations are repeated for each image of said at least some of the images acquired by the camera 2. This way, a touchless user interface control is enabled, which does not require any physical contact between the user 5 and any physical device. Interestingly, changes in the relative size (i.e., the apparent dimension) of the anthropometric feature 6 of the user are used as a proxy to compensate for user translations along the camera axis. Additional user movements (e.g., rotations and translation parallel to the camera plane) can possibly be compensated, as in embodiments discussed later. The proposed method can be regarded as a real-time gesture tracking method, though not limited to hand, or head movements. Unlike other markerless systems relying on depth cameras or inertial measurement units (IMUs), here the method can typically rely on a conventional camera 2, such as a basic RGB camera. That is, the proposed method does not require any specific equipment. Rather, it can typically be performed using a standard computer (e.g., a laptop) a smartphone, or a tablet equipped with a standard camera. As such, the proposed method is particularly well suited to games and, more specifically, to digital health applications. An illustration of a digital health application is a scenario in which a person engages in home-based physical therapy. In this scenario, a software program is run on a computer equipped with an RGB webcam, so as to implement a method as described above, with a view to providing guidance and motivation during the performance of physical therapy exercises. To be visible to the camera, the person typically needs to stand at a distance from the computer, making it inconvenient to use traditional input devices like a keyboard or mouse. In such situations, the proposed method enables the user to interact in a contactless manner with the software and the computer, eliminating the need to physically touch any device to interact with the software. For example, this person may come to pause and resume a session, pause and resume playback of a video, adjust the sound volume, or exit the application. P187721PC00 10 Pose tracking algorithms mostly assume images obtained from RGB cameras. Thus, various types of pose tracking algorithms (which are known per se) may be contemplated for use in the present context. A pose tracking algorithm makes it possible to extract keypoints 51 – 54 using machine learning methods, which are also know per se. Examples of suitable pose tracking algorithms include the so-called BlazePose GHUM

(https://arxiv.org/abs/2210.06551), D3DP

, MixSTE (https://arxiv.org/abs/2203.00859), U- CondDGConv@GT_2D_Pose (https://arxiv.org/pdf/2107.07797v2.pdf), and UGCN (https://arxiv.org/pdf/2004.13985v1.pdf) algorithms. Alternatively, an off-the-shelve algorithm may be devised, which is specifically adapted for the present purpose, as the present inventors did. E.g., the pose tracking algorithm may restrict to the extraction of a few, predetermined keypoints 51, 52, 52r, 53. As usual in gesture tracking, the keypoints 51 – 54 map <fixed points= of the user body 5; such points are fixed in the user reference frame FU but moves in the base reference frame FB. The keypoints 51 – 54 are spatial locations points of particular interest in each image. Thus, the user keypoints 51 – 54 can be regarded as <semantic key points=, to the extent they are associated with defined points of the user body, corresponding to defined locations, such as the right shoulder, left shoulder, left hip, etc. Such locations are invariant to image transformations (e.g., rotation, translation, distortion, etc.). The target keypoint 53 is a selected point on the body; this keypoint 53 is used to guide the graphical element 71 (e.g., a cursor, as assumed in the following) on the display screen. That is, movement of this point 53 in the physical (i.e., real) world results in movement of the cursor 71 on the display screen. Outputs from the pose tracking algorithm are exploited in such a manner that the target keypoint 53 remains the same across successive images; it always corresponds to the same location on the user body. Note, a <reference frame= (also called <frame of reference=, or <frame= for short) refers to an abstract coordinate system, whose origin, orientation, and scale, are specified by reference points. However, unlike an abstract coordinate system, a reference frame is associated with a given thing (e.g., an image, the user body, etc.), to which a coordinate system is attached to. As a result, it is possible to assign a location and time to any event in each reference frame. In practice, however, the terminologies <reference frame=, <frame=, and <coordinate system=, are used interchangeably. In the present context, the <base reference frame= FB refers to an image acquired by the camera 2 (each image considered gives rise to a respective frame F_B), the P187721PC00 11 <reference frame of the user= FU (or <user reference frame=) relates to the user, and the reference frame F_S of the display refers to the screen plane of the screen of the display device 4. Each frame can be defined in dimension n (e.g., n = 2 or 3), which requires n + 1 reference points. E.g., when using Cartesian coordinates, a reference frame is defined by a reference point at the origin and a reference point at a unit distance along each of the n coordinate axes. While the dimension n is normally equal to 2 or 3, certain widgets such as scroll bars may occasionally require 1D movements only. In the present context, each reference frame is preferably defined in accordance with a Cartesian coordinate system, as usual in the field. The coordinate transformation performed at step S60 is a reference frame transformation. By definition, in the present context, the computerized system 1 has no information about the absolute distance of the user from the camera 2. Because the scheme used to compute the coordinates in the base reference frame F_B is agnostic to the true, absolute distance of the user from the camera, the present method relies on the change in the relative size (i.e., the apparent dimension) of the anthropometric feature 6 of the user in the base reference frame to compensate for changes in the user’s distance to the camera 2. I.e., such changes are used as a proxy to compensate for user translations along the camera axis, i.e., the axis extending along a direction that is perpendicular to the camera plane. By definition, this anthropometric feature 6 has an essentially constant size in the reference frame F_U of the user. Thus, the movement of the displayed graphical element 71 is made independent of a possible translation of the user in a direction perpendicular to the camera plane. As a result, a movement of the target keypoint 53 results in a consistent movement of the graphical element 71 as displayed by the display device 4, irrespective of the distance of the user from the camera 2. This makes it possible to efficiently track user gestures and, thus, enable a realistic, contactless user interface, irrespective of changes in the position of the user perpendicularly to the camera plane (i.e., movements along the camera axis). The transformed coordinates are preferably rescaled by applying a scaling function. Various scaling functions may be contemplated. In general, such functions can be defined as an algorithmic procedure, which involves a numerical or analytical calculation. Preferably, this function is defined analytically, to speed up calculations. For example, the scaling function used may be a rational polynomial function or a simple polynomial function f. This function may notably involve one or more constants c1, c2, …, d0, P187721PC00 12 s0, where d0 (respectively s0) corresponds to an initial relative length (respectively an initial relative area) of a constant (i.e., rigid) anthropometric feature 6, determined in accordance with at least two (respectively at least three) selected points of the keypoints 51 – 54. Two points define a line, while three coplanar points define a boundary of an area. For example, when using a 1D anthropometric feature of length d, the function may be of the form f(d) = (c1 + c2 d₀)/(c₁ + c₂ d) or f = c₁ + c₂ d₀/d. When using a 2D anthropometric feature of area s, the function may be of the form f(s) = (c1 + c2 s0)/(c1 + c2 s) or f(s) = c1 + c2 s0/s. More sophisticated approaches can be contemplated, which combine s and d variables, such as f(d, s) = (c1 + c2 d0 + c₃ √s₀)/(c₁ + c₂ d + c₃ √s) or f(d, s) = c₁ + c₂ d₀/d + c₃ √s₀/s. The variables d and s are updated for each image. In all of the above examples, the anthropometric data capture a relative size of a geometric element, which is bounded by at least two keypoints 51, selected from the identified keypoints 51 – 54, see FIGS.2 and 3. And the transformed coordinates of the keypoints are rescaled S70 according to a scaling function, which is preferably defined analytically. As illustrated above, the scaling function may take as argument a ratio of a reference size to the relative size of the anthropometric feature 6, as computed in the base reference frame F_B for each image. For example, one may rely on a simple scaling factor of the form d0/d. Note, the target keypoint 53 is preferably distinct from the two keypoints bounding the line segment of length d. As said, the two keypoints 51 used to compute d correspond to ears in the example of FIGS.2 and 3. I.e., the relative size of the anthropometric feature is a relative length 45 of the line segment bounded by the two selected keypoints 51, such that the ratio d0/d is a ratio of a reference length d₀ to the length d as computed for each image considered. More generally, though, these two keypoints may also correspond to eyes, iliac crests, greater trochanters, etc., and accordingly bound a rigid line segment of the user body. The length d is essentially constant in the user reference frame, subject to small (i.e., negligible) variations. In other words, said length is a distance between two keypoints on the user body that are essentially rigidly connected. Using an inverse function of the form d0/d reflects the optical reality as seen by the camera 2 well, inasmuch as the apparent size of an object is inversely proportional to the distance to the camera, owing to the perspective. That is, considering the laws of optics and perspective, an inverse function of the form d0/d will accurately compensate for perpendicular translations. P187721PC00 13 In embodiments, the reference size d0 corresponds to a value of the size d as computed for an initial image of the images repeatedly acquired by the camera 2. For example, d₀ is the length of the reference distance as measured at time t = 0 upon camera setup, and d is the same distance as measured at the current time point. Note, d is typically updated at a rate of 15 to 20 fps, as further discussed below. A convenient alternative is to fix d0, instead of initially measuring it. The length d₀ may actually be any predefined length. For example, d may be measured relative to a fixed length of each image, e.g., the width or height of each image considered, or the length of its diagonal (i.e., the <size= of the screen). If necessary, a correction is applied to each image to compensate for an optical distortion (by the camera 2) of the updated coordinates of the keypoints, as obtained in the base reference frame FB, . The distortion can for instance be described as follows: xd = xu1 + k1 r² + k2 r⁴ + k3 r⁶ + ..., and y_d = y_u1 + k₁ r² + k₂ r⁴ + k₃ r⁶ + ..., where xd, yd are the distorted coordinates, xu, yu are the undistorted coordinates, r is the distance from the optical centre (the origin of the camera axis on the camera plane), and k₁, k₂, k₃, are the radial distortion coefficients. By determining the distortion coefficients, it is possible to invert the distortion equation to undistort a full image. A more efficient alternative is to undistort coordinates of the sole keypoints of interest. Besides, tangential distortion occurs when the camera's lens is not perfectly aligned with the image sensor. It can be modelled using additional parameters. I.e., xd = xu + (2 p1 x y + p2 r² + 2 x²), and y_d = y_u + (p₁ r² + 2 y² + 2 p₂ x y), where x_d, y_d are the distorted coordinates, x_u, y_u are the undistorted coordinates, x, y are the coordinates after correcting for radial distortion, and p1, p2 are the tangential distortion coefficients. So, applying the inverse of the above distortion equations makes it possible to correct tangential distortion. In each case, corrections can be applied to the sole keypoints of interest instead of the full image. In variants, though, such corrections can also be systematically applied to full images, as usual in the art. In further variants, the scaling function is devised to compensate for the optical distortion of the camera 2, at least partly, whereby the correction is performed S70 upon rescaling. That is, P187721PC00 14 the scaling function can be purposely altered to compensate for the optical distortion of the hardware camera. E.g., this function can for instance be devised as a series of the form f(d) = c₁ + c₂ d₀/d + c₃ (d₀/d)² + …, still taking d₀/d as argument. Alternatively, the function can again be devised as a rational polynomial function. A polynomial form is preferred, given that radial distortion can be modelled using polynomial equations. In each case, though, the parameters of the function can be adjusted to partly compensate for optical distortion. The camera 2 is configured to repeatedly acquire S20 images at a given frame rate R1, which is usually fixed. Still, the pose tracking algorithm can be instructed to execute S30 for each image of only a subset of the repeatedly acquired images. That is, the pose tracking algorithm is executed S30 at an average rate R2 that is strictly less than R1 (R1 < R2). Preferably, R2 is chosen so that R1/2 ≤ R2 ≤ 2 R1/3. That is, the refresh rate of the relative size of the anthropometric feature (e.g., the length d) is lowered to allow more time for the pose tracking algorithm to perform inferences, i.e., to infer the semantic keypoints. For example, the actual frame rate of the camera will typically be equal to 30 fps, whereas the preferred rate for refreshing d is preferably set between 15 and 20 fps. The Nyquist sampling theorem advocates using a sampling frequency that is at least twice the highest frequency of interest. E.g., if a minimal frequency of 5 Hz is needed to adequately track human gestures, then the relative size d would have to be refreshed at a rate of 10 fps, at least. Note, it is certainly tricky to define the highest frequency of interest for human activities, inasmuch as this frequency depends on the desired application. However, considering usual human activities (such as walking, for example), typical frequencies are between 0 and 20 Hz, and 98% of the FFT amplitude is already contained below 10 Hz. Thus, one may for instance consider the Nyquist/folding frequency to be on the order of 8 – 10 Hz, which would advocate using 16 – 20 Hz as a sampling rate. That is, if R0 is a maximal user interaction frequency of interest for the problem at issue, one may further impose R2 ≥ 2 R0, where R0 can be regarded as the Nyquist or folding frequency for the problem at hand. As noted earlier, a coordinate transformation is applied (at step S60, FIG. 7) to the updated coordinates of the keypoints to obtain transformed coordinates of the keypoints in the user reference frame F_U. This coordinate transformation is a transformation between the base reference frame FB and the user reference frame FU. Interestingly, this transformation can be simply determined based on the updated coordinates (in the base reference frame FB) of keypoints 52, 52r selected from the keypoints 51 – 54. The selected keypoints 52, 52r include P187721PC00 15 three non-collinear keypoints, which define a reference plane of the user, as shown in FIGS.2 and 3. Thus, in embodiments, the method further comprises determining S40, for each image, the coordinate transformation between the frame FB and frame FU based on the updated coordinates of the selected keypoints 52, 52r, as computed in the frame FB. The coordinate transformation is then applied S60 for the reference frame F_B. Note, unit vectors can be defined for the user reference frame FU, from the selected keypoints 52, 52r. One of these keypoints defines the origin of the user reference frame FU. A similar transformation is determined and then applied S60 for each new reference frame F_B (i.e., for each new image considered). So, in practice, for each new image considered, the transformation (e.g., a translation and a rotation) is determined S40 and applied S60. Then, one rescales S70 the keypoint coordinates according to the anthropometric feature size, and subsequently map S80 the target keypoint to a position on the display screen. As noted above, the frame FU (attached to the user's body) can be determined from the 3D coordinates of selected keypoints, as defined in the frame F_B. The selected keypoints 52, 52r define axes x', y', z' of the frame F_U, as illustrated in FIG. 3. Corresponding unit vectors are defined from the axes, given an origin 52r. Note, x', y', z' denote both the axes of the frame FU and the corresponding unit vectors in FIGS.2 and 3. Similarly, x, y, z denote both the axes of the frame F_B and the corresponding unit vectors in FIG.3. In practice, such points 52, 52r can easily be determined thanks to outputs from the pose tracking algorithm. For example, see FIG. 3, if the frame FU is centred on keypoint 52r, corresponding to the left shoulder, y' can be taken as the axis that is parallel to the shoulder-to-shoulder axis, i.e., the axis extending through the keypoints corresponding to the right shoulder and the left shoulder. Conversely, the axis x' can be taken as a vector parallel to the shoulder-to-hip axis. Thus, the remaining axis/unit vector z' is perpendicular to the plane spanned by x' and y'. So, a suitable keypoint selection makes it possible to determine the coordinates of the three unit vectors x', y', z' in the frame F_B. Such unit vectors can then be used to determine the Euler angles between the frames FB and FU. As evoked earlier, the coordinate transformation determined at step S40 preferably combines a rotation and a translation. And, as noted above, the rotation can be defined based on Euler angles between the base reference frame FB and the reference frame of the user 5. In addition, the translation can be defined based on coordinates of a reference keypoint in the base reference frame FB. A reference plane (x', y') is defined by three non-collinear keypoints 52, 52r, which P187721PC00 16 are selected from the keypoints 51 – 54, see FIG. 2. The reference keypoint 52r is selected among the keypoints 52 and corresponds to the origin of the user reference frame. I.e., the reference keypoint 52r and the reference plane (x', y') are used to define the user reference frame FU, which translates and rotates together with the user 5. So, the user reference frame FU has its origin at the reference keypoint 52r, and its XY plane coincides with the reference plane As a result, the movement of the cursor 71 is isolated (i.e., made independent) from any rotation of the user around the human longitudinal axis (i.e., cephalocaudal axis) when standing upright in the real-world space. That is, a movement of the target keypoint 53 results in a consistent movement of the cursor 71, independently of any rotation of the user 5 in front of the camera 2. Similarly, the movement of the cursor is isolated from any translation of the user parallel to the camera plane. Rigid translations of the whole body in the XY plane do not result in undesired movements of the cursor 71. This improves user interactions and the user experience. In more details, the Euler angles {³, ´, µ} between two coordinate systems (say ^^^ and x'y'z') are derived from the rotation matrix between the two coordinate systems, knowing that: ³ = atan2(R(1, 0)/cos(´), R(0, 0)/cos(´)), ´ = atan2(–R(2, 0), sqrt(R(0, 0)² + R(1, 0)²)), and µ = atan2(R(2, 1)/cos(´), R(2, 2)/cos(´)), where R is the rotation matrix and R(i, j) denotes the value in row i and column j. The above equations assume a <ZYX= Euler angle convention (yaw, pitch, and roll). Now, to compute the rotation matrix R that transforms one coordinate system to another, information about the basis vectors of the two coordinate systems is needed. The basis vectors of the xyz coordinate system can be noted e1 = (e1x, e1y, e1z), e2 = (e2x, e2y, e2z), e₃ = (e_3x, e_3y, e_3z). In the present context, these may always be set to e₁ = (1, 0, 0), e₂ = (0, 1, 0) and e₃ = (0, 0, 1). Similarly, for the second x'y'z' coordinate system, the basis vectors can be written e1' = (e1'x, e_1'y, e_1'z), e_2' = (e_2'x, e_2'y, e_2'z), and e_3' = (e_3'x, e_3'y, e_3'z). These are uniquely defined by the choice of the non-collinear keypoints chosen to define the second reference system. Next, we can normalize the basis vectors, to ensure they are unit vectors. I.e., if such vectors are not unit vectors, they are divided by the square root of their respective length (i.e., their P187721PC00 17 norm). We subsequently compute the rotation matrix R that transforms vectors from the xyz coordinate system to the x'y'z' coordinate system as follows: R = [e_1' e_2' e_3'] × [e₁ e₂ e₃]^T Here, [e1' e2' e3'] is a matrix whose columns are the basis vectors of the x'y'z' coordinate system, [e1 e2 e3] is a matrix whose columns are the basis vectors of the xyz coordinate system, and T denotes the transpose operation. The vectors e1’, e2’, and e3’ are orthonormal. In order to avoid gimbal locks, a different convention can be used for the Euler order of rotation, or by using a Quaternion representation of the rotation. Interestingly, the vectors and matrices involved can be padded so as to allow a single vector matrix multiplication. Thus, in embodiments, the coordinate transformation is defined and applied S60 as a single matrix multiplication, as discussed in detail in Section 2. This repetitive operation can advantageously be offloaded to a GPU, if any. As illustrated in FIG.4, the GUI 7 may display one or more additional graphical elements 72 on the display device 4, beside the first graphical element 71. Now, various algorithmic recipes can be applied to allow the user to perform an action on the second graphical element 72, thanks to control exerted through the first element 71. In particular, referring to FIG. 8, the method may further comprise running S130 – S140 a monitoring algorithm to detect (S130: Yes; S140: Yes) a potential action to be performed S150 on the second graphical element 72 based on a relative position of the first graphical element 71 and the second graphical element 72 in the reference frame FS of the display. Said action may for instance be a mere selection of the element 72, trigger an action (e.g., start execution), or display a drop-down menu relating to the element 72. The monitoring algorithm is executed concurrently with the main flow, see step S110 in FIG. 8. It may notably involve a computer vision algorithm and/or a timer triggering the desired action. A computer vision algorithm may be used to detect a particular gesture of the user 5, which gesture will trigger the action. Alternatively, or in addition, a timer may trigger this action based on a time duration during which a position of the first graphical element 71 coincides with a position of the second graphical element 71. Note, the positions of the first and second graphical elements 71, 72 may be considered to coincide if one of the elements (e.g., a mouse pointer) overlaps with the second (e.g., an icon). P187721PC00 18 If the action performed consists of a selection of the second graphical element 72, then subsequent steps of the method are henceforth performed in respect of the element 72. That is, upon detecting a selection of the element 72, the method instructs, for each image subsequently obtained (S20, S25) from the camera 2, to execute S30 the pose tracking algorithm and rescale S70 the subsequently transformed coordinates with a view to updating S80, S90 a position of the second graphical element 72, as so far done in respect of the first graphical element 71. Referring now to FIG. 5, attractors can advantageously be used to ease manipulations by the user. In particular, the position of the first graphical element 71 can be updated S80 according to an attractor field assigned to the second graphical element 72. The update step S80 additionally makes use of the rescaled coordinates of the target keypoint 53, as explained earlier. Various types of attractors can be contemplated, which are known per se. However, it is preferred to rely on attractor field that is devised so as to be minimal at the centre of the second graphical element 72 and beyond an attraction field boundary surrounding the second graphical element 72 and be maximal at an intermediate distance between the centre of the second graphical element 72 and the attraction field boundary, as illustrated in FIG.5. Use can for instance be made of a cosine function, which is maximal at the half distance to the centre of the second element 72, see FIG. 6. The function actually depicted is ^{^} ^ (1 − Cos[2^^]), where u corresponds to the normalized radial distance (x-axis), while the value of the function reflects the attraction magnitude (y-axis). That is, this function is used to model the intensity of the attractor along a direction of a vector ^ extending from the centre of a displayed graphical element to the position of the cursor 71, as discussed in detail in section 2. FIG.7 shows a preferred flow. The GUI starts running at step S10. A loop is started at step S20, whereby the camera repeatedly acquires images. Only a subset of the images produced are fed S30 to the pose tracking algorithm, for it to identify (first time) and then update keypoint coordinates. I.e., some of the images are dropped (S25: No, S35). The coordinate transformation is determined at step S40 and then applied (step S60) to the updated coordinates of the keypoints, so as to obtain transformed coordinates of the keypoints in the user reference frame. In parallel to steps S40 – S60, anthropometric data are computed at step S50; the transformed coordinates of the keypoints are rescaled at step S70, according to anthropometric data. The position of a graphical element in the display reference frame is accordingly updated at step S80 and a corresponding signal is sent S90 to the display device to move the graphical element displayed. P187721PC00 19 As seen in FIG. 8, a monitoring algorithm can be run in parallel to the flow of FIG. 7 (this corresponding to Flow #1, step S110 in FIG. 8). The relative position of the first and second graphical elements (see FIG.4) is monitored at step S120, which is continually performed. If the positions of the two elements is determined to match (S130: Yes), a computer vision algorithm is run, and/or a timer is triggered. Next, the method checks S140 whether a certain criterion is met. I.e., is the computer vision algorithm able to identify a predetermined gesture? Has a predefined time period elapsed? If so (S140: Yes), an action (e.g., selection, start execution, display drop-down menu, etc.) is triggered at step S150 in respect of the second graphical element. Note, the algorithm keeps on monitoring S120 the relative position of the graphical elements, irrespective of the outcomes of steps S130 and S140. Next, according to another aspect, the invention can be embodied as a computerized system 1 for enabling a contactless user interface. The computerized system 1 comprises a camera 2, a display device 4, and processing means 230. The system 1 may be a desktop computer 3, a smartphone, a tablet, a laptop, etc. In all cases, the processing means 230 are configured to execute S10 a GUI 7, instruct the camera 2 to repeatedly acquire S20 images of a user 5, and perform steps (see, e.g., steps S25 – S90 in FIG. 7) as described earlier in reference to the present methods. The processing means may for instance be adequately configured by loading software (e.g., as initially stored in a storage device 255) in the main memory of the system 1 and executing the corresponding instructions by the processing means 230. Closely related, a final aspect of the invention concerns a computer program product for enabling a contactless user interface. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by processing means 230 of a computerized system 1 as described above, to cause the computerized system 1 to take steps according to the present methods. The program instructions may typically embody several software modules, as depicted in FIG. 10. That is, in addition to physical components (camera 2, display device 4), the processing means 230 may be configured to implement several software modules 231 – 238. For example, a first module 231 is run to execute the pose tracking algorithm, which causes the coordinate update module 232 to update the coordinates of the keypoints in the base reference frame FB. The module 233 is run to compute anthropometric data from the updated coordinates. The P187721PC00 20 module 234 computes and applies the coordinate transformation to obtain transformed coordinates of the keypoints in the user reference frame F_U. The transformed coordinates are then rescaled (according to the anthropometric data) by the module 235, whereby a tracking module 236 can update positions of the graphical elements to be displayed in the display reference frame FS in accordance with the rescaled coordinates of the target keypoint. Additional modules 237 – 238 may be involved. For example, an optical distortion module 237 may be run to correct keypoint coordinates. Moreover, a monitoring module 238 may be executed, whereby a computer vision module 238 may be run to identify specific user gestures. Alternatively, or in addition to the computer vision module, this module 238 may involve a timer to trigger an action in respect of a graphical element displayed, as explained earlier. The signals consumed and produced by the various modules 231 – 238 and components 2, 4 transit trough input/output (I/O) management units 260 – 270, see also FIG. 9. In particular, the updated positions of the graphical elements can be sent to the display device 4, for it to accordingly display movements of elements manipulated by the user, in operation. Additional features of the present computerized systems 1 and computer program products are described in section 3. The above embodiments have been succinctly described in reference to the accompanying drawings and may accommodate a number of variants. Several combinations of the above features may be contemplated. Examples are given in the next section. 2. Particularly preferred embodiments The following describes particularly preferred embodiments, which rely on algorithms that allow users to interact with a computer application through gestures captured by an RGB camera. Specifically, such gestures can be used to press <buttons= (i.e., icons) and move a cursor on the display screen. Such algorithms allow a mode of user interactions that does not require physical contact with any device, whether the main device on which the method primarily executes or a peripheral device. Thus, no peripheral device is needed. Instead, the main device (e.g., unit 200 in FIG. 9) captures the user’s movements through an RGB camera 2. A computer vision algorithm processes the camera’s images in real-time, identifying and tracking a set of keypoints 51 – 54 on the body, such as the hands, wrists, shoulders, hips, knees, ankles, and feet. The position of a selected keypoint (i.e., target keypoint 53) is mapped to the position of a cursor 71 on the display screen, which the user can use to interact with the computer P187721PC00 21 application. The proposed allows an intuitive use of the cursor 71 through the following innovations: • The movement of the cursor is independent of the rotation of the user in space; • The movement of the cursor is independent from translations of the user parallel to the camera plane; and • The movement of the cursor is independent from translations of the user in the direction perpendicular to the camera plane. When the position of the cursor overlaps with that of a button or any other interactive element on the display screen, the user can interact with it in two ways: (a) by performing a specific hand gesture (e.g., opening and closing the hand, which would correspond to a mouse click); or (b) by overlapping the cursor’s position with that of the button for a predetermined amount of time. The two mechanisms may possibly be concurrently implemented. As described in the previous section, keypoints correspond to actual body locations in the <world space,= i.e., the physical world in which the user moves. The target keypoint is a point on the body that guides the cursor on the display screen. Movement of this point in the real world results in movement of the cursor on the display screen. A reference keypoint and reference plane are used to define a reference frame that translates and rotates together with the user. The reference plane is defined by the three non-collinear keypoints 52, 52r, see FIG. 2. The reference system has its origin at the reference keypoint 52r, and its XY plane coincides with the reference plane. The reference distance corresponds to the anthropometric feature, i.e., a distance between two keypoints on the body that are rigidly connected (e.g., the ears, the eyes). 2D images from the camera 2 can be associated with a base reference frame F_B. The keypoints are generated by the pose tracking algorithm. They consist of XYZ coordinates for a set of points on the human body, identified from the 2D image of the RGB camera 2. The XY coordinates of each keypoint are set between 0 and 1, this depending on the position of the keypoint in the image. The top left corner is [0, 0], and the bottom right corner is [1, 1]. The z- coordinate encodes information in the direction perpendicular to the camera frame. It is defined relative to the point at the centre of the hips and indicates whether another keypoint is in front of or behind the hips, using numbers between 0 and 1 that roughly correspond to the unit of the other axes x, y. Note, the z coordinate only provides relative information between the centre of P187721PC00 22 the hips and another keypoint. It does not capture information as to the absolute distance of the user from the camera. The display screen is associated with a further frame F_S. The user can control the position of the cursor 71 by moving the target keypoint 53. The aim is to map the position of a target point 53 in the physical space to the position of the cursor 71 on the display screen and allow the user to interact with elements displayed on the display screen. The following description address the following problems: • The movement of the cursor should ideally be isolated from a rotation of the user in the physical space. I.e., a movement of the target keypoint should ideally result in a consistent movement of the cursor, independently of any rotation of the user with respect to the camera; • Furthermore, the movement of the cursor should ideally be isolated from translations of the user parallel to the camera plane. Rigid translations of the whole body in the XY direction should not result in moving of the displayed cursor 71; • The movement of the cursor should be isolated from the translation of the user along the camera axis, i.e., in the direction perpendicular to the camera plane. That is, a movement of the target keypoint should result in a consistent movement of the cursor, independently of the distance of the user from the camera plane; and • Lack of fine motor control, latency in the computation of the keypoints, and/or jittering of the keypoints, may prevent the user from guiding the pointer 71 towards a desired interactive element in a smooth and intuitive way. The following describes an adaptive mapping function to compensate for translations and rotations of the user. Referring to FIG.3, the proposed solution is to transform the coordinates of the target keypoint 53 from the base reference frame FB, defined by axes ^, ^, ^ (unit vectors of the frame FB are defined along said axes), to the user reference frame FU attached to the user’s body, and defined by axes ^^{^}, ^^{^}, ^′. Note, unit vectors of the frames FB are FU are defined along the axes ^, ^, ^ ^{and ^^, ^^, ^′, whereby the frames FB are FU can be respectively referred to as the ^^^ and} _^ ^{^} _^ ^{^} _{^′ frames. The coordinate transformation sought can be defined as:}

P187721PC00 23 where ^ is a 3 × 3 rotation matrix and ^ is the 3 × 1 translation vector. The matrix on the right-hand side represents the combined transformation. The rotation matrix ^ can be defined using a combination of rotations around the x-axis, y- ^{axis, and z-axis. The general form of the rotation matrix is:} _{^ = ^^(^)^^(^)^^(^),} ^{where ^, ^, and ^, are the Euler angles between the ^^^ frame and the ^′^′^′ frame, and ^^,} _{^^, and ^^, are the rotation matrices. The rotation matrices around each axis can be defined as:}

1^{0 0} ^{^^(^) = ^0 cos(^) −sin(^) ^.} ^{0 sin(^) cos(^)} _{^ is given by ^ = [^^ ^^ ^^], where ^^, ^^, and ^^ are the coordinates of the reference} keypoint 52r in the ^^^ frame. This transformation ensures that the cursor’s position is not affected by translations or rotations of the user in space, but only by relative movements between the target keypoint and the reference keypoint. The following describes an adaptive mapping function to estimate the depth thanks to a rigid anthropometric feature, so as to isolate the cursor movements from any translation of the user along the camera axis. Because the z-coordinates in the frame F_B do not indicate the absolute distance of the user from the camera, we may advantageously rely on the change in size of a rigid object as a proxy for the user’s distance from the camera. This object can for instance be chosen as the segment 45 (also called reference segment, see FIG.3). ^{The x and y coordinates can be scaled using the following formula:}

P187721PC00 24 where ^_^ is the length of the reference distance measured at time ^ = 0 (upon camera setup), and ^ is the reference distance at the current time point. The distance ^ is regularly updated, at a rate of 15 to 20 fps. As a result, the target position is not affected by the distance of the user from the camera. Moreover, use can be made of attractors. Indeed, the lack of fine motor control, latency in the computation of the landmark positions, and/or jittering of the landmarks position computation, may prevent the user from guiding the pointer towards a desired element 72 in a smooth and intuitive way. A solution is to add <attractors= around the targets 72. Targets can be any elements on the display screen, such as buttons and widgets, which the user can interact with through the cursor 71. The added attractors may act as gravity fields, pulling the cursor towards the centre of the interactive element when the cursor is within an attraction distance and thus caught by the attraction field. Care should be taken to carefully design such attractors to prevent the user from accidentally pressing the wrong button or getting stuck in the element 72. A possibility is to design the attractors for them to act by adding an offset ^ to the position of the pointer ^, when the cursor is within the attraction distance. This yields a vector ^_^ defined ^by: _{^^ = ^ + ^.} Use can advantageously be made of a <cosine velocity attractor= for the computation of ^, namely:

where ^ are the coordinates of the centre of the attractor (see the black dot at the centre of the shape 72 in FIG.4), ^ points to the pointer position (see the white dot in FIG.4), and ^^{^} points to the projection (see the patterned dot in FIG. 4) of ^ to the attraction field border 80. Note, the projection is actually defined as the extension of ^ to the boundary 80. In other words, ^^{^} points to a point where the boundary 80 is intersected by the axis passing through ^ and originating from the centre of the element 72. The constant C is adjusted to control the rate of attraction of the cursor 71 to the centre of graphical element 72. The behaviour of the above attractor can be described as follows. The cursor 71 is not attracted when located at the centre of the attractor (to prevent to get stuck at the shape 72) or at (or P187721PC00 25 beyond the boundary 80 of the attractor. However, the cursor 71 is attracted when located between the boundary 80 and the centre of the attractor. Note, this relation affects the velocity of the offset. A cosine-like function determines the amplitude of the attraction field, see FIG.5. The cosine function is equal to 0 at u = 0 and 1 (^ = 2^ |^ − ^|⁄ ^^^{^} − ^^ ), and equal to 1 at u = 0.5. Such values correspond to the centre of the attractor, the edge of the attractor, and the edge of the attraction field, i.e., the boundary 80, assuming the attraction field has a diameter that is twice the diameter of the attractor. Note, this function is clipped to 0 for u > 1 and undefined for u < 0. The accumulation of offset over multiple interactions with attractive elements may sometimes lead to the pointer drifting off the display screen. To counteract this, we may advantageously apply an exponential decay to the offset. That is, when the pointer is not within the attraction ^{field of an attractor, one may use}

where D is the decay constant (0 < D < 1). This ensures that the offset will go back to zero when the attraction is gone. The complete formulation of the offset value thus becomes:

3. Technical implementation details In embodiments, the method steps S10 – S150 described earlier in reference to FIGS.7 and 8, are implemented in (or triggered by) software, e.g., one or more executable programs, executed by processing means 230 of the computerized system 1, which may include one or more computerized units such as depicted in FIG. 9. Preferably, though, the system 1 consists of a single unit 200, such as a smartphone, a tablet, a laptop, or a desktop computer, which integrates all required components, starting with the camera 2, the display device 4, and processing means 230. Computerized devices and components can be suitably configured for implementing embodiments of the present invention as described herein. In that respect, it can be appreciated that the methods described herein are partly non-interactive, i.e., partly automated. Automated P187721PC00 26 parts of such methods can be implemented in software, hardware, or a combination thereof. In exemplary embodiments, automated parts of the methods described herein are implemented in software, as a service or an executable program (e.g., an application), the latter executed by suitable digital processing devices. The methods described herein shall typically be in the form of executable program, script, or, more generally, any form of executable instructions. As depicted in FIG.9, a typical computerized device (or unit) 200 may include a processor 230 and a memory 250 (possibly including several memory units) coupled to one or memory controllers 240. The processor 230 is a hardware device for executing software loaded in a main memory of the device. The processor can be any custom made or commercially available processor. The processor may notably be a central processing unit (CPU), as assumed in FIG. 9. Note, however, that some of the operations (in particular related to the pose tracking) may possibly be offloaded to a peripheral processing unit, such as a GPU, or remotely executed, e.g., by a server in data communication with the unit 200. The memory 250 of the unit 200 typically includes a combination of volatile memory elements (e.g., random access memory) and non-volatile memory elements, e.g., a solid-state device. The software in memory may include one or more separate programs, each of which comprises executable instructions for implementing functions as described herein. In the example of FIG. 9, the software in the memory includes methods described herein in accordance with exemplary embodiments and a suitable OS. The OS essentially controls the execution of other computer (application) programs and provides scheduling, input-output control, file and data management, memory management, and communication control and related services. It may further control the distribution of tasks to be performed by various processing units. The computerized unit 200 can further include a display controller 282 coupled to a display device 4. In exemplary embodiments, the computerized unit 200 further includes a network interface 290 or transceiver for coupling to a network (not shown). In addition, the computerized unit 200 will typically include one or more input and/or output (I/O) devices 2, 210, 220 (or peripherals, including the camera 2) that are communicatively coupled via a local I/O controller 260. A system bus 270 interfaces all components. Further, the local interface may include address, control, and/or data connections to enable appropriate communications among the aforementioned components. The I/O controller 135 may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to allow data communication. P187721PC00 27 When the computerized unit 200 is in operation, one or more processing units 230 execute software stored within the memory of the computerized unit 200, to communicate data to and from the memory 250 and/or the storage unit 255 (e.g., a hard drive and/or a solid-state memory), and to generally control operations pursuant to software instruction. The methods described herein and the OS, are read (in whole or in part) by the processing elements, typically buffered therein, and then executed. When the methods described herein are implemented in software, the methods can be stored on any computer readable medium for use by or in connection with any computer related system or method. Computer readable program instructions described herein can be downloaded to processing elements from a computer readable storage medium, via a network, for example, the Internet and/or a wireless network. A network adapter card or network interface 290 may receive computer readable program instructions from the network and forwards such instructions for storage in a computer readable storage medium 255 interfaced with the processing means 230. Aspects of the present invention are described herein notably with reference to a flowchart and a block diagram. It will be understood that each block, or combinations of blocks, of the flowchart and the block diagram can be implemented by computer readable program instructions. These computer readable program instructions may be provided to one or more processing elements 230 as described above, to produce a machine, such that the instructions, which execute via the one or more processing elements, create means for implementing the functions or acts specified in the blocks of the flowcharts of FIGS. 7, 8 and the block diagram of FIG. 10. These computer readable program instructions may also be stored in a computer readable storage medium. The flowchart and the block diagram in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of the computerized unit 200, methods of operating it, and computer program products, according to various embodiments of the present invention. Note that each computer-implemented block in the flowcharts or the block diagram may represent a (sub)module, or a set of instructions, which comprise(s) executable instructions for implementing the functions or acts specified therein. In variants, the functions or acts mentioned in the blocks may occur out of the order specified in the figures. For example, two blocks shown in succession may actually be executed in parallel, concurrently, or still in a reverse order, depending on the functions involved and the algorithm P187721PC00 28 optimization retained. It is also reminded that each block and combinations thereof can be adequately distributed among special purpose hardware components. While the present invention has been described with reference to a limited number of embodiments, variants, and the accompanying drawings, it will be understood by those skilled in the art that various changes may be made, and equivalents may be substituted without departing from the scope of the present invention. In particular, a feature (device-like or method-like) recited in a given embodiment, variant or shown in a drawing may be combined with or replace another feature in another embodiment, variant or drawing, without departing from the scope of the present invention. Various combinations of the features described in respect of any of the above embodiments or variants may accordingly be contemplated, that remain within the scope of the appended claims. In addition, many minor modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention is not limited to the particular embodiments disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims. In addition, many other variants than explicitly touched above can be contemplated. For example, other types of attractors may be relied on.

Claims

P187721PC00 29 CLAIMS 1. A computer-implemented method of enabling a contactless user interface, the method comprising: executing (S10) a graphical user interface (7) to display a graphical element (71) on a display device (4); instructing a camera (2) to repeatedly acquire (S20) images of a user (5); and instructing, for each image of at least some of the images acquired, to execute (S30) a pose tracking algorithm to update coordinates of keypoints (51 – 54) of the user (5) in a base reference frame (FB) corresponding to said each image, the keypoints including a target keypoint (53), compute (S50), from the updated coordinates, anthropometric data capturing a relative size of an anthropometric feature (6) of the user (5) in said each image, apply (S40, S60) a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (FU) of the user (5), rescale (S70) the transformed coordinates according to said anthropometric data, update (S80) a position of the graphical element (71) in a reference frame (F_S) of the display according to the rescaled coordinates of the target keypoint (53), and send (S90) a signal encoding the updated position to the display device (4) to accordingly move the graphical element (71) displayed. 2. The computer-implemented method according to claim 1, wherein the anthropometric data computed includes a relative size of a geometric element bounded by at least two (51) selected ones of the keypoints (51 – 54), and the transformed coordinates are rescaled (S70) according to a scaling function taking as argument a ratio of a reference size to said relative size as computed in the base reference frame (FB) for said each image. P187721PC00 30 3. The computer-implemented method according to claim 2, wherein said relative size is a relative length of a line segment bounded by two selected ones of the keypoints (51), and said ratio is a ratio of a reference length to said length as computed for said each image. 4. The computer-implemented method according to claim 2 or 3, wherein the reference size corresponds to a value of said size as computed for an initial image of the repeatedly acquired images. 5. The computer-implemented method according to claim 2 or 3, wherein the reference size is determined in accordance with one or each of a width and a height of said each image. 6. The computer-implemented method according to any one of claims 1 to 5, wherein the method further comprises, for said each image, applying a correction to compensate for an optical distortion of the updated coordinates of the keypoints, in the base reference frame (F_B), by the camera (2). 7. The computer-implemented method according to any one of claims 1 to 6, wherein the camera (2) is configured to repeatedly acquire (S20) said images at a given frame rate R₁; the pose tracking algorithm is instructed to execute for each image of only a subset of the repeatedly acquired images, at an average rate R2 that is strictly less than R₁; and, preferably, R1/2 ≤ R2 ≤ 2 R1/3. 8. The computer-implemented method according to any one of claims 1 to 7, wherein the method further comprises, for said each image, determining the coordinate transformation between the base reference frame (FB) and the reference frame (FU) of the user based on the updated coordinates of selected ones (52, 52r) of the keypoints (51 – 54) in the base reference frame (F_B), and P187721PC00 31 the selected ones (52, 52r) of the keypoints include three non-collinear keypoints (52, 52r), which define the reference frame (F_U) of the user (5). 9. The computer-implemented method according to claim 8, wherein the coordinate transformation applied (S60) combines a rotation and a translation, the rotation is defined based on Euler angles between said base reference frame (FB) and the reference frame of the user (5), and the translation is defined based on coordinates of a reference keypoint (52r) in the base reference frame (FB), wherein the reference keypoint (52r) is selected from the three non-collinear keypoints and defines an origin of the reference frame (FB) of the user (5). 10. The computer-implemented method according to claim 9, wherein the coordinate transformation is defined and applied (S60) as a single matrix multiplication, and the coordinate transformation is optionally applied (S60) by a graphics processor unit. 11. The computer-implemented method according to any one of claims 1 to 10, wherein said graphical element (71) is a first graphical element (71), the graphical user interface (7) is executed (S10) to display a second graphical element (72) on the display device (4). 12. The computer-implemented method according to claim 11, wherein the method further comprises running (S130 – S140) a monitoring algorithm to detect (S130: Yes; S140: Yes) a potential action to be performed (S150) on the second graphical element (72) based on a relative position, in the reference frame (F_S) of the display, of the first graphical element (71) and the second graphical element (72). P187721PC00 32 13. The computer-implemented method according to claim 12, wherein the monitoring algorithm includes one or each of: a computer vision algorithm to detect a particular gesture of the user (5) triggering said action; and a timer triggering said action based on a time duration during which a position of the first graphical element (71) coincides with a position of the second graphical element. 14. The computer-implemented method according to claim 11 or 13, wherein said potential action is a selection of the second graphical element (72), and the method further comprises, upon detecting said potential action, instructing, for each image of at least some of the images subsequently acquired by the camera (2), to execute the pose tracking algorithm and rescale the subsequently transformed coordinates with a view to updating a position of the second graphical element (72), as previously done in respect of the first graphical element (71). 15. The computer-implemented method according to any one of claims 11 to 14, wherein the position of the first graphical element (71) is updated (S80) according to an attractor field of the second graphical element (72), in addition to said rescaled coordinates of the target keypoint (53). 16. The computer-implemented method according to claim 15, wherein said attractor field is devised so as to be minimal at a centre of the second graphical element (72) and beyond an attraction field boundary surrounding the second graphical element (72), and maximal at an intermediate distance between the centre of the second graphical element (72) and the attraction field boundary. 17. A computerized system for enabling a contactless user interface, wherein the computerized system comprises a camera (2), a display device (4), and processing means, the latter configured to: execute (S10) a graphical user interface (7) to display a graphical element (71) on the display device (4); instruct the camera (2) to repeatedly acquire (S20) images of a user (5); and P187721PC00 33 perform, for each image of at least some of the images acquired, each of the following steps: executing (S30) a pose tracking algorithm to update coordinates of keypoints (51 – 54) of the user (5) in a base reference frame (FB) corresponding to said each image, the keypoints including a target keypoint (53), computing (S50), from the updated coordinates, anthropometric data capturing a relative size of an anthropometric feature (6) of the user (5) in said each image, applying (S40, S60) a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (FU) of the user (5), rescaling (S70) the transformed coordinates according to said anthropometric data, updating (S80) a position of the graphical element (71) in a reference frame (F_S) of the display according to the rescaled coordinates of the target keypoint (53), and sending (S90) a signal encoding the updated position to the display device (4) to accordingly move the graphical element (71) displayed. 18. A computer program product for enabling a contactless user interface, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by processing means of a computerized system, which further comprises a camera (2) and display device (4), to cause the computerized system to: execute (S10) a graphical user interface (7) to display a graphical element (71) on a display device (4); instruct a camera (2) to repeatedly acquire (S20) images of a user (5); and instruct, for each image of at least some of the images acquired, to execute (S30) a pose tracking algorithm to update coordinates of keypoints (51 – 54) of the user (5) in a base reference frame (FB) corresponding to said each image, the keypoints including a target keypoint (53), compute (S50), from the updated coordinates, anthropometric data capturing a relative size of an anthropometric feature (6) of the user (5) in said each image, P187721PC00 34 apply (S40, S60) a coordinate transformation to the updated coordinates to obtain transformed coordinates of the keypoints in a reference frame (F_U) of the user (5), rescale (S70) the transformed coordinates according to said anthropometric data, update (S80) a position of the graphical element (71) in a reference frame (FS) of the display according to the rescaled coordinates of the target keypoint (53), and send (S90) a signal encoding the updated position to the display device (4) to accordingly move the graphical element (71) displayed.