WO2021073743A1

WO2021073743A1 - Determining user input based on hand gestures and eye tracking

Info

Publication number: WO2021073743A1
Application number: PCT/EP2019/078214
Authority: WO
Inventors: Pertti Ikalainen; Jari Tuomas Savolainen
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2021-04-22
Anticipated expiration: 2022-04-17

Abstract

A system (10) for providing a human-computer interface particularly suitable for head-mounted displays (20) such as AR and VR glasses. The system (10) combines an eye tracking module (11) and a hand gesture module (12) so that the users (14) can privately navigate in a virtual or augmented environment using their eyes to locate a virtual object (29) and execute functions (28) of the virtual object (29) with simple and intuitive finger movements.

Description

DETERMINING USER INPUT BASED ON HAND GESTURES AND EYE

TRACKING TECHNICAL FIELD

The disclosure relates generally to user input detection, more particularly to methods and systems for controlling augmented and/or virtual reality devices using hand gesture recognition in combination with eye tracking.

BACKGROUND

For many years, human-computer interactions have been mostly carried out using a standard keyboard and/or mouse, with a screen providing a user with visual feedback of the keyboard and/or mouse input. With the constantly improving technology of computer-based devices and the development of mobile smart phones, smart watches and smart AR and VR glasses, these keyboards have now become more and more ineffective or, in the case of VR glasses, even impossible means of providing instructions. While these recently developed digital devices provide a range of functionalities, they typically suffer from the shortcoming that they require the use of at least one hand or the voice of the user as input for operation. For example, smart phones typically include a touchscreen which is utilized to enter nearly all input instructions or to navigate a user interface. This can be particularly burdensome for those without full use of at least one of their hands, either due to medical issues such as disabilities or due to the nature of the work being performed. Speech input has recently also gained popularity in controlling digital devices, however outside of the home environment talking commands aloud raises lots of privacy issues (e.g. possibly sharing sensitive information) or may even prove impossible in noisy environments.

In the specific case of AR and VR glasses, the common input methods such as a keyboard or touchscreen cannot be used at all. Therefore, further alternative methods have been developed to control the user interface of AR/VR glasses by head movement, touching the side of the glasses, tracking hands with a camera, using a separate controller unit, or using eye tracking with a built-in camera. However, none of these solutions are intuitive or comfortable alone, and most of them can only provide input solution for very simple on/off functions. Head movement and gestures are not intuitive methods of interacting with a user interface as people move their head normally in a very slow and smooth way, in contrast to how it would be required for operation of these interfaces. Hand tracking cannot be used for more than few seconds before the hands of users get tired. Separate controllers often get lost, are difficult to carry around and their batteries need charging. In search of more intuitive means for human computer interaction for smart glasses, other solutions such as eye tracking and hand gesture recognition have become available in recent years.

US patent US7572008 for example describes a method and an installation for detecting and tracking eyes and gaze angles to be used for observing or determining the position on a monitor or display at which a computer user is looking.

US patent application US20160313801 A1 describes a method and apparatus for a gesture-controlled interface for wearable devices, wherein the apparatus includes bio-potential sensors for detecting bio-electrical signals from the body of the user. The bio-potential sensors may include surface nerve conduction (SNC) sensors for detecting surface nerve conduction signals to be compared with data of reference signals corresponding to known gestures, thus enabling the apparatus to identify a known gesture from the plurality of known gestures that corresponds to the surface nerve conduction signal. The problem of these existing solutions however is that they alone cannot provide enough dimensions and freedom for long lasting interactions with ARA/R glasses in a way that is private, intuitive and can be an ergonomic experience. SUMMARY

It is an object to provide an improved method and system for determining user input which overcomes or at least reduces the problems mentioned above, by combining eye tracking and hand or finger movement detection for gesture recognition.

The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, there is provided a system for determining user input comprising: an eye tracking module configured to identify a gaze location of a user and determine a gaze vector based on the gaze location; a hand gesture module configured to identify a hand gesture of a user and determine an interaction signal based on the hand gesture; and a host device in data connection with the eye tracking module and the hand gesture module, the host device comprising a processor configured to determine a virtual location based on the gaze vector, determine an interaction based on the interaction signal, and execute a function at the virtual location corresponding to the interaction. The system combines eye tracking (that may be integrated into AR glasses) and nerve or other hand or finger movement or move intention signal detection (that may be integrated into a smart watch). By combining these two systems the users can privately navigate, using only their eyes, through the (augmented) user interface and select functions with simple and intuitive finger movements. The solution can thus be compared to the introduction of mouse for the personal computer. However, instead of a physical device that is moved by hand and clicking buttons, the presented solution provides a way to “move the cursor” with the eyes and “press a button” with gentle finger gestures. The users can even keep their hands in their pockets and control the host device (such as AR glasses) in a private manner. In response to simple and intuitive inputs of the user the system can quickly and accurately perform certain functions mapped to commands on the selected location (e.g. virtual object), whereby the mapping of gestures to commands may be universally defined, across many users, facilitating development of various applications which employ at least some commonality in user interface.

In a possible implementation form of the first aspect the hand gesture module comprises at least one of a hand-mounted unit, a wrist-mounted unit, or a finger- mounted unit, and wherein the hand gesture module is further configured to identify the hand gesture of the user by detecting a signal indicating at least one of a movement or an intention of movement of at least one finger of the user.

In a further possible implementation form of the first aspect the hand gesture module is further configured to identify a hand gesture comprising the user pressing together a thumb and any of the remaining four fingers on one hand separately or in any possible combination, and the processor is further configured to determine the interaction based on reference data for the interaction signal, the reference data comprising: in case of pressing together a thumb and an index finger, determine “Select” as interaction, in case of pressing together a thumb and a middle finger, determine “Back” as interaction, in case of pressing together a thumb and a ring finger, determine “Settings or Menu” as interaction, in case of pressing together a thumb and a pinky finger, determine “Home” as interaction.

In a further possible implementation form of the first aspect the hand gesture module is a wrist-mounted unit comprising at least one surface nerve conduction, SNC, sensor configured to detect at least one SNC signal, and wherein the hand gesture module is further configured to identify the hand gesture based on the at least one SNC signal.

In a further possible implementation form of the first aspect the hand gesture module is further configured to identify pressure p applied by the at least one finger of the user based on the amplitude and a frequency of the at least one detected SNC signal.

In a further possible implementation form of the first aspect the system further comprises a head-mounted device comprising the eye tracking module and a display configured to provide Augmented Reality or Virtual Reality to the user, the Augmented Reality or Virtual Reality comprising at least one virtual object; and the eye tracking module is further configured to determine the gaze vector by determining a gaze of the user on the display by tracking eye movement of the user, identifying a focus area on the display based on the gaze of the user being located in an area of the display for a time duration t exceeding a predefined focus time threshold tr, and determining coordinates of the gaze vector based on coordinates of the focus area on the display; and the processor is further configured to identify a selected virtual object based on the gaze vector, the selected virtual object being displayed in the focus area, and to execute a function of the selected virtual object corresponding to the interaction. In a further possible implementation form of the first aspect the head-mounted device is a host device comprising the processor; and the processor is further configured to execute a function of the selected virtual object corresponding to at least one of a “Select” interaction or a “Back” interaction determined based on the interaction signal. According to a second aspect, there is provided a method for determining user input, the method comprising: identifying a gaze location of a user and determining a gaze vector based on the gaze location using an eye tracking module; identifying a hand gesture of a user and determining an interaction signal based on the hand gesture using a hand gesture module; determining a virtual location based on the gaze vector; determining an interaction based on the interaction signal; and executing a function at the virtual location corresponding to the interaction.

By combining the methods of eye tracking and hand gesture recognition, the users of the system can privately navigate, using only their eyes, through the (augmented) user interface and select functions with simple and intuitive finger movements. The presented solution provides a way to “move the cursor” with the eyes and “press a button” with gentle finger gestures. The users can keep their hands in their pockets and control the host device (such as AR glasses) in a private manner. In response to simple and intuitive inputs of the user the system can quickly and accurately perform certain functions mapped to commands on the selected location (e.g. virtual object), whereby the mapping of gestures to commands may be universally defined, across many users, facilitating development of various applications which employ at least some commonality in user interface.

In a possible implementation form of the second aspect the hand gesture is identified by at least one of a hand-mounted unit, a wrist-mounted unit, or a finger- mounted unit, and identifying the hand gesture further comprises detecting a signal indicating at least one of a movement or an intention of movement of at least one finger of the user.

In a further possible implementation form of the second aspect identifying the hand gesture comprises identifying the user pressing together a thumb and any of the remaining four fingers on one hand separately or in any possible combination; and determining the interaction is further based on based on reference data for the interaction signal, the reference data comprising: in case of pressing together a thumb and an index finger, determine “Select” as interaction, in case of pressing together a thumb and a middle finger, determine “Back” as interaction, in case of pressing together a thumb and a ring finger, determine “Settings or Menu” as interaction, in case of pressing together a thumb and a pinky finger, determine “Home” as interaction.

In a further possible implementation form of the second aspect identifying the hand gesture further comprises detecting at least one surface nerve conduction, SNC, signal using a wrist-mounted unit comprising at least one SNC sensor; and identifying the hand gesture based on the at least one SNC signal.

In a further possible implementation form of the second aspect identifying the hand gesture further comprises identifying pressure p applied by the at least one finger of the user based on the amplitude and a frequency of the at least one detected SNC signal.

In a further possible implementation form of the second aspect the method further comprises: providing Augmented Reality or Virtual Reality to the user on a display of a head- mounted device, the head-mounted device comprising an eye tracking module, and the Augmented Reality or Virtual Reality comprising at least one virtual object; determining a gaze of the user on the display by tracking eye movement of the user using the eye tracking module; identifying a focus area on the display based on the gaze of the user being located in an area of the display for a time duration t exceeding a predefined focus time threshold tr, determining coordinates of the gaze vector based on coordinates of the focus area on the display; identifying a selected virtual object based on the gaze vector, the selected virtual object being displayed in the focus area; and executing a function of the selected virtual object corresponding to the interaction.

In a further possible implementation form of the second aspect the head-mounted device is a host device comprising a processor, the processor being configured to execute a function of the selected virtual object corresponding to at least one of a “Select” interaction or a “Back” interaction determined based on the interaction signal.

According to a third aspect, there is provided a system for determining user input comprising: an eye tracking module; a hand gesture module; a host device in data connection with the eye tracking module and the hand gesture module, the host device comprising a processor; and a storage device configured to store instructions that, when executed by the processor, cause the components of the system to perform a method according to any one of the possible implementation forms of the second aspect.

These and other aspects will be apparent from and the embodiment(s) described below.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following detailed portion of the present disclosure, the aspects, embodiments and implementations will be explained in more detail with reference to the example embodiments shown in the drawings, in which:

Fig. 1 shows a flow diagram of a method for determining user input in accordance with an embodiment of the first aspect, implemented on a system in accordance with a corresponding embodiment of the second aspect; Fig. 2 shows a flow diagram of determining different interactions based on hand gestures in accordance with an embodiment of the first aspect, implemented on a system in accordance with a corresponding embodiment of the second aspect;

Fig. 3 shows a flow diagram of identifying the a hand gesture based on an SNC signal in accordance with an embodiment of the first aspect, implemented on a system in accordance with a corresponding embodiment of the second aspect;

Fig. 4 illustrates a head-mounted device in accordance with an embodiment of the second aspect;

Fig. 5 shows a flow diagram of identifying a virtual object in a displayed Augmented Reality or Virtual Reality in accordance with an embodiment of the first aspect, implemented on a system in accordance with a corresponding embodiment of the second aspect;

Fig. 6 shows a flow diagram of executing different functions of a selected virtual object corresponding to different interactions in accordance with an embodiment of the first aspect, implemented on a system in accordance with a corresponding embodiment of the second aspect; and

Fig. 7 shows a block diagram of a system for determining user input in accordance with one embodiment of the second aspect.

DETAILED DESCRIPTION Fig. 1 illustrates in a combined flow diagram a system 10 as well as method steps for determining user input according to the present disclosure, e.g. for controlling an Augmented Reality or a Virtual Reality device, such as the head-mounted device illustrated in Fig. 4.

The system comprises at least an eye tracking module 11 , a hand gesture module 12, and a host device 13 comprising a processor 15.

“Eye tracking” herein refers the process of measuring either the point of gaze (where one is looking) or the motion of an eye relative to the head, whereby an “eye tracking module” refers to a device configured to measure eye positions and eye movement. There are a number of methods for measuring eye movement, the most popular ones using captured photo or video images from which the eye position is extracted, wherein other alternative methods use search coils or are based on the electrooculogram.

In an exemplary embodiment, the eye tracking module 11 comprises a camera or multiple cameras facing a user 14 which may capture one or more images, which are used to determine the presence of human body features (e.g., face, nose, ears) to facilitate the identification of human eyes. If the camera or cameras are located close to the face of the user 14, they may capture images of sufficient resolution to facilitate estimation of the gaze location 21 . For wearable devices, a camera may be placed on the device itself facing the user's eyes, enabling gaze location detection.

A gaze location detection subsystem may further be implemented in the eye tracking module 11 . The gaze location detection subsystem may use one or more eye gaze point direction estimation and/or detection techniques to estimate and/or detect a direction of view. A region of interest (ROI) subsystem may determine coordinates of a ROI on an image being captured by the camera. The size of the ROI and confidence level of accurate detection may be determined by the technique or techniques used for gaze point detection. Either or both of these parameters may be used by the system 10 to determine the size of the ROI in which to perform a search.

In another possible embodiment, a gaze location detection subsystem may analyze an image of the eye and may determine the gaze direction by computing the vector defined by the pupil center and a set of glints generated in the eye by an infrared illuminator. To increase the resolution of the vector, a camera with a narrow field of view may be used. Maintaining the eyes centered in the image, the camera may move to follow the eyes and compensate for the head movements.

Another example gaze location detection subsystem may allow combined tracking of the user's eye positions and the gaze direction in near real-time. Such a system may use two video cameras mounted on the left and right side of a display 16 and may use facial feature detection to determine the position of the pupil in the eyes. A cornea-reflex method may be used to determine the gaze direction. For example, a low-power infrared-light emitting diode (LED) array may illuminate the eye and may generate a highlight on the cornea surface. An algorithm may identify and localize the center of both the pupil and the corneal surface reflection. The distance between the two centers and their orientation (e.g., gaze vector 22) may provide a measure of the gaze direction.

“Hand gesture” herein refers to a form of non-verbal or non-vocal, visible bodily action executed by any part of a user’s hand (such as any or all of the fingers or the open palm, or the closed fist), whereby a “hand gesture module” refers to a device configured to recognize such a hand gesture, for example using motion or electric current sensors and mathematical algorithms for interpreting signals from these sensors. In an embodiment, the hand gesture module 12 comprises at least one of a hand-mounted unit, a wrist-mounted unit, or a finger-mounted unit.

In an embodiment, the hand gesture module 12 is configured to identify a hand gesture 25 of a user 14 by detecting a signal 24 indicating at least one of a movement or an intention of movement of at least one finger of the user 14.

“Host device” herein refers to any computer-based device comprising at least one processor and configured to communicate with other hosts or components within a computer-based network. A host may work as a server offering information resources, services, and applications to users, components or other hosts on the network; or work as a client that initiates requests for such services.

In some embodiments the host device 13 may for example be a mobile communications handset device such as a smartphone or a multi-function cellular phone, or a mobile terminal or user equipment of a wireless communication system. In some embodiments, as illustrated in Fig. 4, the host device 13 may be implemented in the form of smart glasses such as an AR or VR glasses (see detailed description below).

The host device 13 of the system 10 according to the present disclosure is in data connection with the eye tracking module 11 and the hand gesture module 12. “Data connection” herein refers to any transmission of (digital) data via wired or wireless communications. The (digital) data is typically transmitted via communications interfaces such as a transmitter and/or receiver. The transceiver and/or receiver means can in some embodiments be configured to communicate via a wireless or wired coupling. The coupling can be any suitable known communications protocol, for example in some embodiments the transceiver and/or receiver means can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA). It is to be understood again that the structure of the electronic device could be supplemented and varied in many ways.

In a preferred embodiment, the communications interface 18 implemented in the host device 13 is a Bluetooth Low Energy (BLE) controller providing reduced power consumption for wireless communication.

As illustrated in Fig. 1 , the method for determining user input using the combination of the eye tracking module 11 and the hand gesture module 12 comprises several steps.

In a first step 101 , a gaze location 21 of a user 14 is identified and, in a subsequent step 102, a gaze vector 22 is determined based on the gaze location 21 using the eye tracking module 11 . The gaze location 21 and the gaze vector 22 can be determined using any known eye tracking apparatus via any known method, some of which are describe above. In an embodiment the gaze vector 22 is a 2- dimensional vector indicating the gaze location 21 on the display 16, or on a graphical user interface (GUI) displayed on the display 16.

In a next step 103, a hand gesture 25 of a user 14 is identified and, in a subsequent step 104, an interaction signal 26 is determined based on the hand gesture 25 using the hand gesture module 12. The hand gesture 25 can be determined using any known gesture recognition method, some of which are describe above. The interaction signal 26 may comprise simple indication of two fingers that the user 14 pressed together, such as for example a thumb and an index finger indicated by the interaction signal [T + I]

After the gaze vector 22 and the interaction signal 26 are determined, the processor 15 of the host device receives these as input data and determines 105 a virtual location 23 based on the gaze vector 22, and further determines 106 an interaction 27 based on the interaction signal 26.

In a final step 107, the processor 15 can execute a function 28 at the virtual location 23 corresponding to the interaction 27.

“Function” herein refers to any command that can be interpreted by a processor in the context of a virtual environment (such as an AR or VR environment, or a GUI shown on a display), either on its own or in relation to a virtual entity (such as a virtual location or a virtual object). Such functions are for example selecting or opening a virtual entity, opening a contextual menu linked to a virtual entity, going back one step in a sequence of commands or returning to a home screen from an application or contextual menu.

In an embodiment illustrated in Fig. 2, identifying 103 the hand gesture 25 comprises identifying the user 14 pressing together a thumb and any of the remaining four fingers on one hand separately or in any possible combination, and determining 106 the interaction 27 is further based on based on reference data for the interaction signal 26.

In a possible embodiment, in case of pressing together a thumb T and an index finger I, the processor 15 determines “SELECT” or “OPEN” or “FORWARD” as interaction 27 based on the reference data.

In a possible embodiment, in case of pressing together a thumb T and a middle finger M, the processor 15 determines “BACK” as interaction 27 based on the reference data.

In a possible embodiment, in case of pressing together a thumb T and an ring finger R, the processor 15 determines “SETTINGS” or “MENU” as interaction 27 based on the reference data. In a possible embodiment, in case of pressing together a thumb T and an pinky finger P, the processor 15 determines “HOME” as interaction 27 based on the reference data.

In an embodiment illustrated in Fig. 3, the hand gesture module 12 is a wrist- mounted unit comprising at least one surface nerve conduction (SNC) sensor 19 configured to detect at least one SNC signal 24A. In this embodiment, the hand gesture module 12 is configured to identify the hand gesture 25 based on at least one SNC signal 24A. This technology can effectively determine which muscle is the target of intended motion or is about to move without the need to detect the electrical signals directly associated with contraction of the muscles.

Some parts of the human body, such as the wrist, contain relatively little muscle tissue and, in particular, relatively little of the bulky part of muscle tissue that produces the predominant portion of the force resulting from contraction of the muscle. Placement of the electrical potential detectors on the skin at parts of the body that have relatively little such muscle tissue can reduce the relative amplitude of “noise” that the muscle contraction electrical signals produce at the skin surface.

In addition, at the wrist the nerves for which the nerve electrical signals are to be detected lie between the detector on the surface of the skin and muscle or other tissue. Therefore, electrical signals occurring at the other tissues may be blocked or attenuated by intervening tissue on their way to the skin. As a result, placement of the electrical potential detectors on the skin at the wrist can further reduce the “noise” in the electrical potential signals associated with the nerve.

Electrical signals occurring in different types of tissue and occurring in the body for different purposes may have different spectral characteristics. For example, the typical frequency range of muscle contraction electrical signals is 20 Hz to 300 Hz with dominant energy in the 50 Hz to 150 Hz range while the typical frequency range of nerve electrical signals is 200 Hz to 1000 Hz. By applying appropriate filters during signal processing of the detected electrical potentials, it is possible to enhance, for example, the relative strength of the nerve electrical signals. The surface nerve conduction (SNC) sensors 19 may be capable of detecting nerve signals from the carpus, wherein the signals are caused by movement or intent of movement of the user 14. In some embodiments, at least three SNC sensors may be required in order to accurately detect the nerve activity from the main nerves (i.e., one sensor for each of the three main nerves). Optionally, the SNC sensors 19 may be aligned in a configuration of multiple pairs in order to detect different sources of electric activity, since each nerve creates a signal in a specific location (for instance a sensor on the back side of an arm may not detect signals of movement on the front of the arm). In a possible embodiment, using the processor 15, the detected at least one surface nerve conduction signal 24A is compared with data of a plurality of reference signals corresponding to a plurality of known hand gestures 25, each of the reference signals distinctly associated with one of the known hand gestures 25. A known hand gesture 25 is identified from the plurality of known hand gestures 25 that corresponds to at least one surface nerve conduction signal 24A. The identified known hand gesture 25 is communicated to the host device 13 (in the form of an interaction signal 26).

In accordance with some embodiments, identifying the known hand gesture 25 includes de-noising the detected at least one surface nerve conduction (SNC) signal, detecting an event in the at least one SNC signal, applying segmentation for determining one or more frames of the detected event, extracting statistical features within the one or more frames, and applying a classification algorithm based on the data to the extracted statistical features so as to determine the known gesture. In accordance with some embodiments, the known hand gesture 25 includes pressing together of at least two fingers, and identifying the pressing together of the at least two fingers includes assessing that the at least one detected surface nerve conduction signal 24A includes an amplitude and a frequency proportional to pressure p applied between the at least two fingers. In accordance with some embodiments of the present invention, the method includes estimating the pressure applied between the at least two fingers by applying the one or a plurality of detected bio-electrical signals to a proportional control pipeline including a convolutional neural network (CNN) and a long short term memory (LSTM) neural network.

In some embodiments, an array of sensors (such as SNC sensors 19) may be integrated into the wrist band of an existing smart-watch, or alternatively may serve as a stand-alone device. Processing the data from these sensors may be accomplished with real-time “machine learning” using a digital signal processing unit (DSP).

In an embodiment illustrated in Fig. 4, the system 10 comprises a head-mounted device (HMD) 20 comprising an eye tracking module 11 and a display 16.

The head-mounted device (HMD) 20 may be any suitable wearable electronic device that enables electronically generated data to be superimposed on a user's field of view. The HMD 20 may consist of a frame which comes into direct skin contact with the user's face around the nose and above the ears. Contact with the nose may be further established via nose bridge and nose pads.

The display is configured to provide at least one of an Augmented Reality or Virtual Reality environment to the user 14.

In the case of providing a Virtual Reality environment, the HMD 20 may comprise a stereoscopic head-mounted display (providing separate images for each eye), stereo sound, and head motion tracking sensors, which may include gyroscopes, accelerometers, magnetometers, and structured light systems to generate realistic images, sounds and other sensations that simulate a user's physical presence in a virtual environment.

In the case of providing augmented reality (AR), the user's view of the real world is digitally enhanced (or augmented) by adding a layer, or layers, of digital information on top of an image being viewed through the HMD 20. Some applications of AR may include sightseeing (e.g., providing information on nearby businesses or attractions), gaming (e.g., digital game play in a real-world environment), navigation, and others. Applications of AR may be suitable for wireless transmit/receive units (WTRUs), such as mobile devices, because mobile devices may be equipped with cameras, sensors, a global positioning system (GPS), and a gyroscope (such as to determine the direction of the camera view). A WTRU also has send/receive capabilities to interact with a server. Augmented Reality applications may further allow a user to experience information, such as in the form of a three-dimensional virtual object overlaid on a picture of a physical object captured by a camera of a device. The physical object may include a visual reference that the augmented reality application can identify. A visualization of the additional information, such as the three-dimensional virtual object overlaid or engaged with an image of the physical object is generated in a display of the device. The three-dimensional virtual object may be selected based on the recognized visual reference. A rendering of the visualization of the three- dimensional virtual object may be based on a position of the display relative to the visual reference.

The Augmented Reality or Virtual Reality according to the present disclosure comprises at least one virtual object 29. The steps of identifying a selected virtual object 29 based on data from the eye tracking module 11 are illustrated in Fig. 5, wherein a head-mounted device 20 comprising an eye tracking module 11 and a display 16 is provided in an initial step 201, and a gaze vector 22 is determined based on the gaze of the user 14 tracked on the display 16 using the eye tracking module 11 as described below.

In a next step 202, the gaze of the user 14 is determined on the display 16 by tracking eye movement of the user 14 using the eye tracking module 11.

In a next step 203, a focus area 30 on the display 16 is determined based on the gaze of the user 14 being located in an area of the display 16 for a time duration t exceeding a predefined focus time threshold tr. In an embodiment the focus time threshold is tr= 1 s. In a next step 204, the coordinates of the gaze vector 22 are determined based on coordinates of the focus area 30 on the display 16.

In a next step 205, a selected virtual object 29 is identified based on the gaze vector 22, the selected virtual object 29 being displayed in the focus area 30.

In a next step 206, a function 28 of the selected virtual object 29 is executed by the processor 15 corresponding to the interaction 27.

In an embodiment, the head-mounted device (HMD) 20 is a host device 13 comprising the processor 15 that is configured to execute a function 28 of the selected virtual object 29 corresponding to an interaction 27 determined based on an interaction signal 26. This embodiment is particularly suitable for manipulating virtual objects 29 which are superimposed onto a user's field of view in an HMD 20.

In an embodiment illustrated in Fig. 6, the processor 15 of the host device 13 (such as a head-mounted device 20) can determine a selected virtual object 29 as described above, after a user 14 gazed upon the area of the display 16 in which the virtual object 29 is located for more than 1s. The selected virtual object 29 may be highlighted on the display 16 as a feedback for the user 14 indicating that the system 10 has recognized the interest in the virtual object 29. After the selection, a function 28 can be executed on the selected virtual object 29 corresponding to a determined interaction 27, based e.g. on detected SNC signals from a wrist- mounted unit. In an example, as illustrated, the user 14 may press together a thumb T and an index finger I which results in opening the virtual object 29 or press together a thumb T and a ring finger R which results in opening the settings of the virtual object 29. The user 14 may then press together a thumb T and a middle finger M which results in going back from the object 29 or its settings, or press together a thumb T and a pinky finger P which results in going back to the home menu on the user interface with no virtual object 29 selected. Similarly, if the user changes the focus area 30 e.g. by gazing upon a different sub-area of the display 16, it also results in de-selecting the previously selected virtual object 29. Fig. 7 shows a block diagram of a system 10 for determining user input according to the present disclosure. Steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

The system comprises at least an eye tracking module 11 , a hand gesture module 12, a host device 13 in data connection with the eye tracking module 11 and the hand gesture module 12, the host device 13 comprising a processor 15 and a storage device 17.

The host device 13 may further comprise a display 16 for conveying information to a user 14, and a communication interface 18 for communicating with external devices directly, or indirectly via a computer network. The components of the host device 13 may be connected via an internal bus configured for handling data communication and processing operations.

In an embodiment, as described before in connection with Fig. 4, the host device 13 may include the eye tracking module 11 and the display 16, e.g. in case the host device 13 is a head-mounted device 20 such as AR or VR glasses. In another possible embodiment, the head-mounted device 20 comprising the eye tracking module 11 and the display 16 is a separate device from the host device 13 and data connection between the head-mounted device 20 and the host device 13 is established via wired or wireless connection through a communications interface 18. Similarly, data connection between the hand gesture module 12 and the host device 13 can also be established via wired or wireless connection through a communications interface 18.

In some embodiments, the processor 15 can be configured to execute various program codes. The implemented program codes can provide an AR or VR environment through the display 16. In some embodiments the host device 13 further comprises a memory. In some embodiments the processor 15 is coupled to the memory. The memory can be any suitable storage means. In some embodiments the memory comprises a program code section for storing program codes implementable upon the processor 15. Furthermore, in some embodiments the memory can further comprise a stored data section for storing data, for example data that has been recorded or analysed in accordance with the application. The implemented program code stored within the program code section, and the data stored within the stored data section can be retrieved by the processor 15 whenever needed via the memory-processor coupling.

In some further embodiments the host device 13 can provide a graphical user interface (GUI) via the display 16. In some embodiments the processor 15 can control the operation of the GUI and receive inputs from the GUI. In some embodiments the GUI can enable a user 14 to input commands to the host device 13, for example via a keypad, and/or to obtain information from the apparatus 1 , for example via a display 16. The GUI can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the host device 13 and further displaying information to the user 14.

The various aspects and implementations have been described in conjunction with various embodiments herein. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject-matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.

The reference signs used in the claims shall not be construed as limiting the scope.

Claims

1 . A system (10) for determining user input comprising: an eye tracking module (11 ) configured to identify a gaze location (21 ) of a user (14) and determine a gaze vector (22) based on said gaze location (21 ); a hand gesture module (12) configured to identify a hand gesture (25) of a user (14) and determine an interaction signal (26) based on said hand gesture (25); and a host device (13) in data connection with said eye tracking module (11 ) and said hand gesture module (12), the host device (13) comprising a processor (15) configured to determine a virtual location (23) based on said gaze vector (22), determine an interaction (27) based on said interaction signal (26), and execute a function (28) at said virtual location (23) corresponding to said interaction (27).

2. The system according to claim 1 , wherein said hand gesture module (12) comprises at least one of a hand-mounted unit, a wrist-mounted unit, or a finger- mounted unit, and wherein said hand gesture module (12) is further configured to identify said hand gesture (25) of said user (14) by detecting a signal (24) indicating at least one of a movement or an intention of movement of at least one finger of said user (14).

3. The system according to claim 2, wherein said hand gesture module (12) is further configured to identify a hand gesture (25) comprising the user (14) pressing together a thumb (T) and any of the remaining four fingers (l,M,R,P) on one hand separately or in any possible combination, and wherein said processor (15) is further configured to determine said interaction (27) based on reference data for said interaction signal (26), said reference data comprising: in case of pressing together a thumb (T) and an index finger (I), determine “Select” as interaction (27), in case of pressing together a thumb (T) and a middle finger (M), determine “Back” as interaction (27), in case of pressing together a thumb (T) and a ring finger (R), determine “Settings or Menu” as interaction (27), in case of pressing together a thumb (T) and a pinky finger (P), determine “Home” as interaction (27).

4. The system according to any one of claims 2 or 3, wherein said hand gesture module (12) is a wrist-mounted unit comprising at least one surface nerve conduction, SNC, sensor (19) configured to detect at least one SNC signal (24A), and wherein said hand gesture module (12) is further configured to identify said hand gesture (25) based on said at least one SNC signal (24A).

5. The system according to claim 4, wherein said hand gesture module (12) is further configured to identify pressure p applied by said at least one finger of said user (14) based on the amplitude and a frequency of said at least one detected SNC signal (24A).

6. The system according to any one of claims 1 to 5, further comprising: a head-mounted device (20) comprising said eye tracking module (11) and a display (16) configured to provide Augmented Reality or Virtual Reality to said user (14), said Augmented Reality or Virtual Reality comprising at least one virtual object (29); wherein said eye tracking module (11 ) is further configured to determine said gaze vector (22) by determining a gaze of the user (14) on said display (16) by tracking eye movement of said user (14), identifying a focus area (30) on said display (16) based on the gaze of the user (14) being located in an area of said display (16) for a time duration t exceeding a predefined focus time threshold tr, and determining coordinates of said gaze vector (22) based on coordinates of said focus area (30) on said display (16); and wherein said processor (15) is further configured to identify a selected virtual object (29) based on said gaze vector (22), said selected virtual object (29) being displayed in said focus area (30), and to execute a function (28) of said selected virtual object (29) corresponding to said interaction (27).

7. The system according to claim 6, wherein said head-mounted device (20) is a host device (13) comprising said processor (15); and wherein said processor (15) is further configured to execute a function (28) of said selected virtual object (29) corresponding to at least one of a “Select” interaction or a “Back” interaction determined based on said interaction signal (26).

8. A method for determining user input, the method comprising: identifying (101 ) a gaze location (21 ) of a user (14) and determining (102) a gaze vector (22) based on said gaze location (21 ) using an eye tracking module (11 ); identifying (103) a hand gesture (25) of a user (14) and determining (104) an interaction signal (26) based on said hand gesture (25) using a hand gesture module (12); determining (105) a virtual location (23) based on said gaze vector (22); determining (106) an interaction (27) based on said interaction signal (26); and executing (107) a function (28) at said virtual location (23) corresponding to said interaction (27).

9. The method according to claim 8, wherein said hand gesture (25) is identified by at least one of a hand-mounted unit, a wrist-mounted unit, or a finger-mounted unit, and wherein identifying said hand gesture (25) further comprises: detecting a signal (24) indicating at least one of a movement or an intention of movement of at least one finger of said user (14).

10. The method according to claim 9, wherein identifying (103) said hand gesture (25) comprises identifying the user (14) pressing together a thumb (T) and any of the remaining four fingers (l,M,R,P) on one hand separately or in any possible combination; and wherein determining (106) said interaction (27) is further based on based on reference data for said interaction signal (26), said reference data comprising: in case of pressing together a thumb (T) and an index finger (I), determine “Select” as interaction (27), in case of pressing together a thumb (T) and a middle finger (M), determine “Back” as interaction (27), in case of pressing together a thumb (T) and a ring finger (R), determine “Settings or Menu” as interaction (27), in case of pressing together a thumb (T) and a pinky finger (P), determine “Home” as interaction (27).

11. The method according to any one of claims 9 or 10, wherein identifying (103) said hand gesture (25) further comprises: detecting at least one surface nerve conduction, SNC, signal (24A) using a wrist-mounted unit comprising at least one SNC sensor (19); and identifying said hand gesture (25) based on said at least one SNC signal

(24 A).

12. The method according to claim 11 , wherein identifying (103) said hand gesture (25) further comprises: identifying pressure p applied by said at least one finger of said user (14) based on the amplitude and a frequency of said at least one detected SNC signal (24 A).

13. The method according to any one of claims 8 to 12, further comprising: providing (201 ) Augmented Reality or Virtual Reality to said user (14) on a display (16) of a head-mounted device (20), said head-mounted device comprising an eye tracking module (11 ), and said Augmented Reality or Virtual Reality comprising at least one virtual object (29); determining (202) a gaze of the user (14) on said display (16) by tracking eye movement of said user (14) using said eye tracking module (11 ); identifying (203) a focus area (30) on said display (16) based on the gaze of the user (14) being located in an area of said display (16) for a time duration t exceeding a predefined focus time threshold tr, determining (204) coordinates of said gaze vector (22) based on coordinates of said focus area (30) on said display (16); identifying (205) a selected virtual object (29) based on said gaze vector (22), said selected virtual object (29) being displayed in said focus area (30); and executing (206) a function (28) of said selected virtual object (29) corresponding to said interaction (27).

14. The method according to claim 13, wherein said head-mounted device (20) is a host device (13) comprising a processor (15), said processor (15) being configured to execute a function (28) of said selected virtual object (29) corresponding to at least one of a “Select” interaction or a “Back” interaction determined based on said interaction signal (26).

15. A system (10) for determining user input comprising: an eye tracking module (11 ); a hand gesture module (12); a host device (13) in data connection with said eye tracking module (11) and said hand gesture module (12), the host device (13) comprising a processor (15); and a storage device (17) configured to store instructions that, when executed by said processor (15), cause the components of the system to perform a method according to any one of claims 8 to 14.