WO2025056916A1

WO2025056916A1 - A system for monitoring user gestures or other movement or state

Info

Publication number: WO2025056916A1
Application number: PCT/GB2024/052386
Authority: WO
Inventors: Kianoush NAZARPOUR; Chenfei MA
Original assignee: University of Edinburgh
Current assignee: University of Edinburgh
Priority date: 2023-09-14
Filing date: 2024-09-13
Publication date: 2025-03-20
Anticipated expiration: 2026-03-14
Also published as: GB202314045D0

Abstract

A system for monitoring user gestures or other movement or state, comprises: at least one sensor configured to perform measurements of muscle, nerve and/or brain activity of a user; a processing resource configured to: receive measurements from the at least one sensor; provide inputs comprising a set of features based on the measurements to a trained machine learning model, wherein the machine learning model is trained to determine distance in a latent space that is of lower dimensionality than a dimensionality of the set of input features, and the distance represents closeness of a position to a centroid or other position in the latent space that corresponds to a classification as a respective one a plurality of different gestures on which the machine learning model is trained, provide an output to the user representing or determined from the determined distance, thereby representing similarity between the gesture or other movement or state by the user and one or more gestures or other movements or states of interest, wherein the output varies as the muscle, nerve and/or brain activity of the user varies.

Description

A system for monitoring user gestures or other movement or state

Introduction

The present disclosure relates to a system, method and apparatus for monitoring user gestures or other movement, and more particularly, the present disclosure relates to a system, method and apparatus for recognising a gesture from a set of possible gestures by processing signals from sensors that detect muscular or other activity.

Background

The electrical activity of muscles, e.g. the myoelectric signals, are measured conventionally from the surface of the skin. Beyond diverse application in neuroscience and clinical neurology, a common use of these signals is in the control of active effectors, e.g. prostheses and exoskeletons or to enable interaction with objects in the virtual, augmented, and mixed reality environments. Machine learning enables the estimation of user intents by decoding distinct grasp or movement signatures from the myoelectric signals. For instance, machine-learning models of forearm and wrist muscle activity that can adapt to a user’s unique typing patterns and enable personalised virtual keyboards are in development.

US10754434B2 describes systems and techniques for gesture capture. A first sensor array may be used to determine a pose status from a library of poses. A second sensor array may be used to determine a fit with a model. The fit may be provided to a gesture consumer based on the fit and the pose. US8761437B2 describes representing motion of the human body by a skeletal model derived from an image of the user and used to perform motion recognition and/or similarity analysis of body motion. CN204009751 II describes a gesture identifying device that utilizes an accelerometer and gyrosensor to realize gesture identification. WO 2020/111344 A1 describes a system for implementing physical motion in a virtual space by using an electromyographic signal. A virtual avatar is generated performing a motion matching a pre-stored electromyographic signal pattern. The avatar is configured to perform a matching motion according to the pattern and the motion of the avatar is visually displayed. US20200275895A1 describes methods and apparatus for training a classification model and using the trained classification model to recognize gestures performed by a user. US20090327171A1 describes a machine learning model trained by instructing a user to perform prescribed gestures.

In conventional motor learning tasks, in which the relationship between the muscle activity and the task is one-to-one, linear, and simple, practice and feedback, including bio-feedback, can improve performance and reduce undesired variability in the relevant degrees of freedom. This improvement in performance may persist over time, that is, the delivery of feedback may support the retention of new myoelectric control skills. However, existing machine learning-based approaches to myoelectric control cannot deliver continuous, intuitive, and smooth feedback about the control space or myoelectric variability, relevance, or redundancy to the user.

Similarly, both one-day and multi-day studies of myoelectric control with machine learning based decoders reveal improvement in control with practice, with fixed, recalibrated, or adaptive decoders. Nonetheless, it is debatable whether these improvements are temporary, reflecting motor adaptation, or long-term, supporting motor learning. A common component of the studies of myoelectric adaptation and learning is feedback, be it presented visually on a screen or with a prosthesis or delivered with electro- or vibro-tactile stimuli. Extrinsic feedback typically provides the user with task results, e.g., target hits in a typical motor control task, or knowledge of the quality of control, e.g., path efficiency in the control space. If motor learning underpins these improvements, one would expect to observe the retention of myoelectric skills after learning. Testing for the retention of myoelectric skills can only be measured in the absence of feedback.

One fundamental challenge with the use of current black-box and typically non-linear machine learning algorithms, e.g., deep learning, is that the mapping of high-dimensional myoelectric signals to low-dimensional task-related feedback will result in discrete, jittery, and non-intuitive control signals. This limitation is compounded by the intrinsic noise in the myoelectric signals and the inaccuracy of known machine learning algorithms in the decoding of myoelectric signals. Summary

In a first aspect, there is provided a system for monitoring user gestures or other movement or state, comprising: at least one sensor configured to perform measurements, for example electrical, magnetic, capacitive, and/or mechanical measurements, of muscle, nerve and/or brain activity of a user; a processing resource configured to: receive measurements from the at least one sensor; provide inputs comprising a set of features based on the measurements to a trained machine learning model, wherein the machine learning model may be trained to determine distance in space, for example a latent space that is of lower dimensionality than a dimensionality of the set of input features, and the distance for instance represents closeness of a position, for example corresponding to features determined from measurements by the or a sensor, to a centroid or other position in the space that corresponds to a classification as a respective one a plurality of different gestures on which the machine learning model is trained, wherein the processing resource may be configure to provide an output, for example a real-time output, to the user representing or determined from the determined distance, thereby for example representing similarity between the gesture or other movement or state by the user and one or more gestures or other movements or states of interest, wherein the output may vary as the muscle, nerve and/or brain activity of the user varies.

The output may represent similarity to a selected one of the plurality of gestures, movements or states or to each of the plurality of gestures, movements or states.

The output may include a plurality of time-varying outputs, each time-varying output representing similarity to a respective one of the plurality of gestures or other movements or states.

The time-varying output or each of the time-varying outputs may comprise or represent a respective single numerical value that represents similarity to a corresponding gesture, movement or state. Each single numerical value may represent distance in the latent space.

The distance in the latent space may be represented by a suitable metric and/or comprises a Euclidian distance, a Jaccard distance, a Hamming distance, a Chebyshev distance or a Minkwoski distance.

The processing resource may be configured to apply a dimensionality reduction process, either included in the machine learning model or as part of pre-processing step before input of the features to the machine learning model.

The dimensionality reduction process may comprise application of at least one dimensionality reduction method and/or may comprise at least one of Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Locally Linear Embedding (LLE), Laplacian eigenmaps (LE), t-distributed stochastic neighbour embedding (t-SNE), or Isomap.

The processing resource may be configured to apply a smoothing process to smooth the real-time output, wherein the smoothing process is included in the machine learning model and/or as of post-processing step applied to the output of the machine learning model.

The smoothing process may comprise applying a Savitsky-Golay filter or other timedependent filter.

The model may comprise a temporal convolutional network (TCN). The TCN may be configured to receive features in the lower dimensionality latent space and to output estimated distances to the centroids or other positions of the classifications of the gestures, movements or states.

The output may comprise at least one graphical indicator whose appearance may vary depending on the determined distance(s), for example a bar, line or other regular or irregular shape. The varying of appearance may comprise at least one of varying size, shape, position, colour or texture.

The at least one sensor may be configured to output measurement signals on a plurality of measurement channels. The inputs to the machine learning model may be obtained based on the measurement signals on the plurality of channels.

The at least one sensor may be configured to sense activity of at least 5 muscles of interest, optionally between 10 and 50 muscles of interest, optionally between 15 and 35 muscles of interest.

The set of features provided as inputs may comprise at least one of an amplitude, peak width or duration of a measurement signal.

The at least one sensor may comprise at least one muscle activity sensor or nerve activity sensor, optionally at least myoelectric sensor, at least one myomagnetic sensor and/or at least one myomechanic sensor.

The gesture(s) may comprise hand gesture(s) and/or facial gesture(s).

The machine learning model may be trained based on pseudo-labels for training examples of the different gestures, movements or states. The pseudo-labels may represent or be determined from distance from the centroid or other position in the latent space for the gestures, movements or states.

In a further aspect, which may be provided independently, there is provided a method of monitoring user movement, comprising: receiving electrical, magnetic, capacitive, and/or mechanical measurements of muscle, nerve and/or brain activity of a user; providing inputs comprising a set of features based on the measurements to a trained machine learning model, wherein the machine learning model is trained to determine distance in a latent space that is of lower dimensionality than a dimensionality of the set of input features, and the distance represents closeness of a position, for example corresponding to features determined from measurements by the or a sensor, to a centroid or other position in the latent space that corresponds to a classification as a respective one a plurality of different gestures, movements or states on which the machine learning model is trained; and providing an output, for example a real-time output, to the user representing or determined from the determined distance, thereby representing similarity between the gesture, movement or state by the user and one or more gestures, movements or states of interest, wherein the output varies as the muscle, nerve and/or brain activity of the user varies.

In another aspect, which may be provided independently, there is provided a system for training a machine learning model for monitoring user gestures or other movement or state, comprising a processing resource configured to: receive a set of measurements, for example electrical, magnetic, capacitive, and/or mechanical measurements, of muscle, nerve and/or brain activity of a plurality of subjects, each measurement each corresponding to a known gesture, movement or state of a set of gestures, movements or states by one of the subjects; for each measurement provide an input comprising sets of features based on the measurement to a dimensionality reduction component that is configured to provide outputs of lower dimensionality than the sets of inputs; for each dimensionally-reduced output determine a label, for example a pseudolabel, that represents for the closeness of a position to a centroid or other position in a latent space that corresponds to a classification of the gesture, movement or state to which the output corresponds; using the labels, for example pseudo-labels, train the machine learning model to determine from an input that comprises features determined from measurements, an output that comprises or represents distance to one or more gestures, movements or states of the set of gestures, movements or states.

The dimensionality reduction may comprise application of at least one dimensionality reduction method and/or comprises at least one of Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Locally Linear Embedding (LLE), Laplacian eigenmaps (LE), t-distributed stochastic neighbour embedding (t-SNE), or Isomap.

The processing resource may be configured to apply a smoothing process to smooth the time-varying dimensionally-reduced outputs and/or to smooth the time-varying labels. The smoothing process may comprise a Savitsky-Golay filter or other time-dependent filter.

The model may comprise a temporal convolutional network (TCN).

In a further aspect, which may be provided independently, there is provided a method of training a machine learning model, comprising: receiving a set of measurements, for example electrical, magnetic, capacitive, and/or mechanical measurements, of muscle, nerve and/or brain activity of a plurality of subjects, each measurement each corresponding to a known gesture, movement or state of a set of gestures, movements or states by one of the subjects; for each measurement providing an input comprising sets of features based on the measurement to a dimensionality reduction component that is configured to provide outputs of lower dimensionality than the sets of inputs; for each dimensionally-reduced output determining a label, for example a pseudo-label, that represents for the closeness of a position to a centroid or other position in a latent space that corresponds to a classification of the gesture, movement or state to which the output corresponds; using the labels, for example pseudo-labels, training the machine learning model to determine from an input that comprises features determined from measurements, an output that comprises or represents distance to one or more gestures, movements or states of the set of gestures, movements or states.

There is also provided a computer program product comprising computer-readable instructions that are executable to perform a method as claimed or described herein.

In another aspect, which may be provided independently, there is provided a trained machine learning model, trained to determine distance in a latent space that is of lower dimensionality than a dimensionality of a set of input features, and the distance represents closeness of a position, for example corresponding to features determined from measurements by a sensor, to a centroid or other position in the latent space that corresponds to a classification as a respective one a plurality of different gestures, movements or states on which the machine learning model is trained. The system may be configured to provide simple, continuous and target-specific feedback.

The system may address limitations of modern machine learning algorithms in processing myoelectric signals, namely, the explainability and smoothness of lowdimensional feedback. Addressing these two challenges enables studies of myoelectric control learning and retention with machine learning decoders.

Features of one aspect may be provided as features of any other aspect, in any appropriate combination. For example, any one of system, method, apparatus or computer program product features may be provided as any one or more other of system, method, apparatus or computer program product features.

Brief description of the drawings

Various embodiments will now be described by way of example only, and with reference to the accompanying drawings, of which:

Figure 1 is a schematic diagram of an apparatus according to an embodiment;

Figure 2 is a schematic diagram showing the elements of the system in neural network training and testing phases;

Figure 3 is a schematic diagram showing the structure of the neural network;

Figure 4 is a set of graphs that illustrate several experimentally obtained signals and processed signals associated with the system;

Figure 5 is a three-dimensional plot showing the locations of the user’s state and class centroids in a low-dimensional sub-space;

Figure 6 is an output displayed to a user while using the system according to an embodiment;

Figure 7 illustrates features of an experiment performed using the system of an embodiment;

Figure 8 shows graphs that illustrate averaged real-time performance results for practice and test block for three groups in an experiment performed using a system of an embodiment;

Figure 9 is a graph showing average performance per group in a baseline block;

Figure 10 shows graphical illustrations of confusion matrices and latent spaces according to an embodiment; Figure 11 is a flowchart that illustrates steps taken to train a neural network according to an embodiment; and

Figure 12 is a flowchart that illustrates steps taken to estimate distances using a trained neural network according to an embodiment.

Detailed description

A system 10 for monitoring user movement according to an embodiment is illustrated schematically in Figure 1. The system for monitoring user movement 10 comprises a computing apparatus 12, in this case a personal computer (PC) or workstation, which is connected to a sensor 14.

The system 10 further comprises one or more display screens 16, 106. In this embodiment, screen 16 is used to output performance data in real-time to the user while screen 106 is used to present data to the user during the neural network training phase. A speaker 108 is included in the system and is used, for example, to deliver audible beeps or other suitable sounds to the user in order to assist with use of the system.

In the present embodiment, the sensor 14 is a surface electromyographic (sEMG) sensor, but in other embodiments, it can be any device that can detect the state of and/or changes in muscular tissue or muscle activity, or nerve activity, for example peripheral nerve activity, or brain activity, or spinal cord activity or any other physiological state or process of interest.

Some examples of sensor technologies that can be used to detect user movements or muscle activity or other activity state or process include electrical, magnetic, capacitive and mechanical sensors. The sensor 14 is configured to generate data that is representative of at least one anatomical region of a patient or other subject. In some embodiments, there may be more than one sensor. In some embodiments, the sensors are used to recognize gestures made by the user and can monitor user movement.

The gestures may comprise muscular movement, position and/or relative position of musculature, optionally at least one of hand gestures and facial gestures. Any suitable muscles or sets of muscles may be the subject of the sensor measurements. While electromyographic signals, in particular surface electromyographic (sEMG) signals, are discussed, these can be replaced with capacitive signals, magnetomyograhic signals, mechanomyograms, ultrasound myographic signals, infrared myographic signals, fiber optic myographic signals and the like in other embodiments. The above signals may, for example, be obtained from muscles, the spinal cord, nerves or the brain or a combination of these. The signals may include non-muscular signals measured from the anatomy.

In the present embodiment, data obtained by the sensor 14 is provided to computing apparatus 12.

Computing apparatus 12 comprises a processing resource 18 for processing of data. The processing resource 18 of Figure 1 comprises a central processing unit (CPU) and Graphical Processing Unit (GPU), and may further comprise a Tensor Processing Unit (TPU). The processing resource 18 provides for automatically or semi-automatically processing data from the sensor 14. In other embodiments, the data to be processed may comprise any other suitable data obtained from measurements performed with respect to a patient or subject, which may be other than measurements based on muscular changes. The processing resource 18 is also configured to receive measured signals from the at least one sensor 14 and provide a set of features extracted from the measured signals to a trained machine learning model.

The processing resource 18 in this embodiment includes model training circuitry 100 configured to train a neural network or other model, processing circuitry 102 configured to execute a trained neural network or other model and control circuitry 104 configured to route the input from the sensor to the various parts of the processing resource 18 and to display the outputs from these parts to the display screen 16, 106. In other embodiments model training circuitry and processing circuity may be provided separately in different apparatus or in different locations. The or a model may be pre-trained, at least partially.

In the present embodiment, the circuitries 100, 102, 104 are each implemented in the CPU and/or GPU and/or TPU by means of a computer program having computer- readable instructions that are executable to perform the method of the embodiment. In other embodiments, the circuitries may be implemented as one or more ASICs (application specific integrated circuits) or FPGAs (field programmable gate arrays). Any suitable processing resource may be used, not limited only to a CPU and/or GPU and/or TPU.

The computing apparatus 12 also includes a hard drive and other components of a PC including RAM, ROM, a data bus, an operating system including various device drivers, and hardware devices including a graphics card. Such components are not shown in Figure 1 for clarity.

The apparatus 10 of Figure 1 is configured to perform a method that is now described with reference to Figure 2.

The system of Figure 1 provides a neural network with inputs comprising the conventional features of multi-channel myoelectric signals and the output being estimated distances, denoted D, between a user’s state in a control space and a centroids of a set of possible gestures. This measure is a unit-free and abstract construct that meets three requirements, namely 1) target-specificity, 2) continuity, and 3) smoothness. To meet these objectives, the neural network creates continuous pseudolabels for the input data in a low-dimensional control space and uses these pseudolabels to train a deep neural network. The machine learning model is trained based on pseudo-labels for training examples of the different gestures, the pseudo-labels representing or being determined from distance from the centroid or other position in the latent space (or low-dimensional sub-space) for the gestures. The neural network in certain embodiments is referred to using the term DistaNet.

Figure 2A and 2B depict schematically the operation of the system in the neural network training 20a and testing or use 20b phases respectively. The system in the untrained phase 20a illustrated here includes decoder portion 24 and classifier 36. Figure 2A also shows a Linear Discriminant Analysis (LDA) decoder element 28. The LDA component in this example is used on features obtained from the EMG data, to obtain a comparison of results using the decoder portion 24 according to an embodiment and results using the LDA decoder element 28. It will be understood that a separate parallel LDA component 28 is not needed in embodiments - it is provided in Figure 2A for comparison purposes. Figure 2A illustrates the training phase. Figure 2B illustrates the real-time operation of the trained system where EMG features are directly mapped to instantaneous D estimates.

Referring to Figure 2A, during the training phase the EMG data from the sensors is fed to a feature extraction step 26. In this embodiment, the features extracted from the SMG signals are waveform length (WL) and log-variance (log-var). In other embodiments, other features may be extracted from the raw EMG data.

The decoder component 24 in this embodiment is a machine learning architecture with three stages or portions, namely, dimension reduction 30, pseudo-label estimation 32, and deep learning (or neural network mapping) 34. Dimensionality reductions may achieved using Linear Discriminant Analysis (LDA) implemented by component 30 in this embodiment.

Dimension reduction 30 transforms high-dimensional samples into a representation in a low-dimensional sub-space.

Pseudo-label estimation 32 maps the low-dimensional samples to continuous pseudolabels in the form of distance between the user state, and the centroids of classes and then smooths the pseudo-labels by polynomial fitting. The continuous pseudo-labels are denoted D in Figure 2A. Smoothing of real-time output may also be carried out after the deep learning component 34.

The terms low-dimensional sub-space, latent space, control space, control manifold and LDA-space are used herein to refer to a space in which the data is represented following dimensionality reduction.

It is noted that a point (e.g. user state) in the control space, also referred to as the latent space, may be defined by a set of numbers in a vector. If the space is higher-dimensional then the point is defined by a larger number of points and hence the vector is longer. In a 3D space, each point is defined by three numbers x, y, and z. By continuous pseudolabelling it can be ensured that point(time_0) and point(time_1) and point(time_2) represent a smooth trajectory (e.g. not a jagged line). Finally, the deep learning component 34 (neural network) learns to estimate distances. The deep learning component 34 once in its trained state and being used is referred to using reference numeral 35, to distinguish between trained and in-training states.

In an EMG-based human-machine interface application, for example in a metaverse, virtual/extended reality, upper-limb prosthetic application or other application, EMG signals can be acquired from multiple channels e.g. from multiple sensors. A decoding algorithm needs to deal with a high-dimensional data space, of size: ‘number of channels’ x ‘number of features’. Although feature extraction methods reduce data complexity, dimension explosion remains a considerable challenge, especially when high-density EMG signals are considered. For a more intuitive, continuous, objective, and visualisable space to users, dimensionality reduction is desirable, for example to three dimensions. Dimension reduction results in a latent space (also referred to as control space or lowdimensional sub-space) of lower dimensionality in comparison to the dimensionality of the set of input features. Each of the plurality of gestures on which the machine learning mode is trained can occupy a centroid or other position in this latent space. The machine learning model is trained to determine a distance in this latent space between the input based on measurement signals on the plurality of channels and the centroids/positions of the plurality of different gestures.

A wide variety of dimensionality reduction methods are available, e.g., Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Locally Linear Embedding (LLE), Laplacian eigenmaps (LE), t-distributed stochastic neighbour embedding (t-SNE), Isomap, etc. Another option for supervised dimensionality reduction include Canonical Correlation analysis. Other options for unsupervised dimensionality reduction, where labelling of data is necessary, include Independent Component analysis, the use of an autoencoder and Local Discriminant embedding.

To maintain high prediction accuracy and robustness, the LDA approach was used for dimensionality reduction in the embodiment of Figure 2.

In some embodiments, dimension reduction is executed in the deep learning component or other model. In an embodiment, the input to the dimension reduction component 30 is multi-channel myoelectric signals from sensors, for example with each channel corresponding to an output from a respective sensor, and the output is a continuous representation of the myoelectric signals in a low-dimensional subspace.

In some embodiments, dimension reduction is executed using directionality reduction methods.

In the Continuous Pseudo-labelling component 32, the instantaneous distance between each sample, which is a point in the LDA-space and the class centroids was chosen to build the pseudo-label D. This approach resulted in a continuous and intuitive feedback when compared to the conventional feedback about decoding outcome likelihood. We used the Euclidean distance but in other aspects, other distances, e.g., Hamming, Chebyshev, Riemannian, Cosine and Minkowski distances could have also been used. We further used a Savitzky-Golay filter to smoothen the distance pseudo-labels. It is a time-domain filtering method based on local orthogonal polynomial least square fitting. The hyper-parameters are window length and filter order, which are determined by an exhaustive search towards higher classification accuracy, between 10-500 and 1-5, respectively. Using this smoothing approach, we improved the accuracy by c. 2%. In other embodiments, smoothing may be performed using any other suitable method.

In an embodiment, the inputs to the pseudo-label estimation component are a lowdimensional representation of the myoelectric signals and an instantaneous Euclidean distance between each sample in the LDA-space and the class centroids in a lowdimensional subspace and the output is the continuous pseudo-labels D.

The deep learning component 34 estimates the distances between the user’s state in the control space, and each of the centroids, where each centroid is a centroid in respect of points obtained for a respective one of the gestures.

Figures 3A and 3B illustrate in more detail the neural network structure employed as the deep learning components 34, 35 in the embodiment of Figure 2. Figure 3A is a signal flow diagram 38 for the temporal convolutional network (TCN) 42 structure used in this embodiment. The structure 40 of the TCN arrangement used as the deep learning component 34, 35 is shown in Figure 3B. Two blocks of TCN 42 are stacked together, each with one dilated convolution layer 44 at the beginning as shown in Figure 3B. The layer sizes of the two blocks are 64 and 32 respectively. The kernel size is 7. The dilation grows, taking the layer number as the power of n, which is the block number. The dropout is set to 0.5 to keep TCN out of the over-fitting problem. According to experiment results, the TCN achieved acceptable results on the nonlinear mapping task between surface electromyographic (sEMG)features and distances.

Other options for encoders include recurrent neural networks, Random Forest regression and Support Vector regression. The TCN receives features in the lower dimensionality latent space and outputs estimated distances or measures determined on the basis of the estimated distances to the centroids or other positions of the gesture classifications.

Referring to Figures 2 and 3, the output of the continuous pseudo-labelling component 32 shown in Figure 2 is fed as input to the first one-dimensional convolution layer 44. The output of the convolution layer is normalised in the normalisation component 46. The chomp component 48 then removes a portion of the signal which is the size of zero padding added in the convolution layer 44. The activation component 50 then calculates the output of the node based on its inputs and the weights on individual inputs. The dropout layer 52 may ensure that the system does not over-fit the solution. This is followed by another one dimensional-convolution layer 44, a normalisation component 46, a chomp component 48, an activation component 50 and a dropout layer 52.

In an embodiment, the inputs to the deep learning component 34 are the smoothed pseudo-labels D and the output is the estimated distances, denoted D, between the user’s state in the control space and the centroids of all gestures.

Finally, the classifier 36 chooses the gesture with the lowest D as its output.

Each of the dimensionality reduction, continuous pseudo-labeling, and deep learning components can be implemented in any suitable manner in other embodiments, and are not limited only to the implementations discussed in relation to Figures 2 and 3. For instance, the continuous pseudo label in the system of Figures 2 and 3 is the Euclidean distance, which could for example be replaced by the Jaccard distance to include a set of samples instead of a pair of samples. This could increase the robustness by turning point-to-point into sequence-to-sequence labelling. Alternatively, by using Mahalanobis distance, point-to-sequence labelling may be provided, which could measure the distance between the current sample to the target distribution.

The LDA decoder component 22 may also be replaced or modified in other embodiments, for example in order to have a better fit to the data while processing. For instance, Linear Optimal Low-Rank Projection (LOL), which is a supervised manifold learning method, could be utilised to reduce the dimensionality incorporating classconditional moment. It might be able to have a better presentation of data that involves spatial overlapping issues.

Referring now to Figure 2B, the system 20b is illustrated in its trained state. In its trained state, the system 20b comprises a trained deep learning component 35 with the ability to estimate D directly from extracted features from the feature extraction component 26. The system follows a similar signal flow as the untrained system 20a except that an input signal, comprising a set of features extracted by the feature extraction component 26 is provided to the trained machine learning model 35. The trained machine learning model is trained on a plurality of gestures as discussed.

In Figure 2A and 2B, the LDA component 22 and the decoder component 24 are shown side by side in a configuration suitable for comparing the performance of an LDA decoder 28 and the DistaNet decoder component 24. In other embodiments, the parallel LDA component 28 is omitted, as mentioned above.

Figures 4A to 4D show signals at different points in the system in one example,

Figure 4A shows 25 seconds of myoelectric signals recorded from the surface of the skin on the forearm of a participant whilst they performed different gestures in one example. The gestures of interest in the example are referred to as Pointer, Tripod, Open, Rest, Lateral and Power.

Figure 4B depicts two attributes (features) of signals from each channel, namely waveform length and the logarithm of variance. Figure 4C illustrates the outputs of deep learning component 35, which are six estimated distances denoted D to six gestures in the task according to the example, including the rest.

At each time instance, the classifier 36 chooses the gesture with the lowest D as its output. During the experiment, in each trial participants were shown a target gesture which they had to perform and hold for 1 second. The system is not limited to six gestures and may incorporate a higher or lower number of gestures in other embodiments.

Figure 4D shows the classifier output with the tick overlay representing the hold period. In some parts of the experiments, subjects were presented a score between 0% and 100% at the end of each trial, which quantified the success of the participant in matching the decoder output and the instructed target gesture during the hold period.

Finally, Figure 4E shows a task state machine in relation to tasks that are the subject of Figure 4.

The system can provide real-time time-varying output representing similarity between the gesture by the user and one or more desired gestures, wherein the real-time output varies as the muscle activity of the user, or other activity or measurement, varies in realtime. The real-time, time-varying output may represent similarity between the user’s gesture and all the other gestures that the machine learning model is trained with at the same time. The real-time output may comprise a plurality of real-time outputs, each representing similarity to one of a plurality of gestures that the machine learning model is trained with. The output represents the determined distance or another measure determined from the determined distance and hence also varies with time. This output may take the form of a number. Additionally, this number may represent the distance in latent space. The real-time output or outputs may be smoothed when processed by the machine learning model or as a process applied to the output or outputs of the machine learning model or as a combination of the two. This smoothing may be achieved by using a Savitsky-Golay filter or any other suitable time-dependent filter.

Figure 5 shows one embodiment of an output of the system as shown to a user on the display screen 16 as distances between the user’s state 54 and centroids of classes 56A-56E representing gestures on a folded poly-plane. Each of the distances between the user’s state 54 and the plurality of class centroids 56A-56E represent the magnitude of D. The positions of the user state 54 and the class centroids 56A-56E are preferably displayed in real-time and are hence time-varying in relation to the variation in gestures of the user.

In some embodiments, an intuitive user interface representing more than 4-5 classes may be provided. This is relevant for prosthetic control, and also other applications including metaverse applications. In complex environments and multifaceted interactions, user-friendly interfaces are particularly important. Such interfaces may accommodate an array of options, functions, and controls, ensuring that users can navigate and interact within these digital landscapes with reduced or minimal effort.

Figure 6 shows another embodiment of an output of the system as shown to a user on the display screen 16. In this embodiment, Figure 6(a) shows the target gesture to be achieved while Figure 6(b) shows the feedback delivered to the user as a score. Figure 6(c) shows the distance D between the user’s state and six centroids. The height of each bar in Figure 6 (b) is proportional to the magnitude of D and is preferably time-varying and depicts in real-time the similarity of the input to one or more of the plurality of gestures. In other embodiments, the magnitude of D could be represented by other variation in the appearance of graphical indicators such as the width of the bars, the colour, the position, the size, the texture or the shape of the bars. Any other suitable graphical or other indicator may be used instead of, or as well as, bars. For example, blobs, colours, games, any suitable regular or irregular shapes could be used as, or provide, indicators.

A single numerical value may be used to display the magnitude of D. Since D represents similarity to a corresponding gesture, the single numerical value may also represent similarity to a corresponding gesture. In this embodiment, the smallest D is for the Lateral gesture and the corresponding graphical indicator is denoted in green (i.e. in a different colour to the graphical indicators for the other gestures). In alternative embodiments the graphical indicator for the closest gesture or other physical movement or shape can be distinguished from graphical indicators for the other gestures or physical movements, for example by having a different shape, colour, size, shading or texture, or by making it visible with at least some of the other graphical indicators being not visible.

Feedback in the form shown in Figure 5 and Figure 6 give the user a real-time, timevarying output representing similarity between the user’s movement and a set of predefined movements. This allows the user to adjust their movement to more closely match one or more of the predefined movement and helps to improve user movement.

Experiments conducted over two days serve to illustrate certain embodiments of the system as elaborated below.

On the first day, EMG training data was collected. Participants sat in front of a computer screen, on which pictures of five hand postures and the rest posture were shown. The postures included power, lateral, tripod, pointer, and hand open, as shown in Figure 7A.

For model training, the EMG data was collected. Participants were instructed to perform and hold each grasp ten times, each being six seconds long. They had ten seconds to relax between each repetition. Finally, the same procedure was repeated for the rest posture. 80% of the recorded data were kept to train the linear discriminate analysis (LDA) decoder and the remaining data was used to test the decoder. Simple cross- validation was performed to avoid over-fitting. The model with the best testing score was picked.

Training data collection was followed by a Baseline block during which all participants used the LDA decoder component 22 to perform 20 trials. Each grasp was presented four times, in a pseudo-randomised order. After completing the baseline block, participants were assigned to one of three groups, namely LDA, DistaNet-S, and DistaNet-SD. The assignment was based on a moving average of the group average scores, that is participants were assigned such that the difference in mean baseline performance between the groups was minimised, as shown in Figure 8. Each training and test block contained 20 trials. Target grasp was presented in a pseudo-random order, such that each was experienced an equal number of times.

Trials started when participants relaxed their muscle activity. An audible beep from the speaker 108 signalled the start of the trial and one of the target grasps was presented. T rials were four seconds long, comprising two periods of three and one seconds, referred to as reach and hold, as shown in Figure 7B. The start of the hold period and the end of the trial were also cued with two audible beeps from the speaker 108. Once a target grasp was presented, the reach period allowed the participant to change the posture of their hand and hold the grasp for one second. At the end of a typical trial, a score was presented in the centre of the screen. The score, expressed as a percentage, refers to the proportion of the hold period during which the output of the decoder matched the instructed grasp. An embodiment of such a display is shown in Figure 6. A two-second rest period was included between trials. Participants had longer rest periods between blocks, around five minutes.

Figure 7C represents the experiment design, including the type and the number of blocks during the two experimental days. Figure 7D reports the full arrangement of the experiment in terms of the decoder, and the availability of scores and visual feedback in the Practice and Test blocks. In summary, no feedback was provided in the Baseline block. In the Practice blocks, participants in the LDA and DistaNet-S groups received score feedback only and participants in the DistaNet-SD group received both score and distance visual feedback. The participants did not receive any feedback in the Test blocks. While time-varying visual feedback was used for the purpose of this experiment, the discussion is equally valid in the case of using auditory, tactile, thermoreceptive and olfactory feedback. In particular, the visual feedback may be in the form of bars, lines, regular shapes and irregular shapes and the like whose dimensions vary with time. Auditory feedback may utilise a particular notification or may modulate the pitch and other such properties of the sound used. Similarly, tactile feedback may take the form of pressure, touch or vibration feedback. In each of the feedback mechanisms described above, a property of the feedback signal delivered to a user varies according to the determined distance D. Other than a variation in dimension, the visual feedback mechanism may vary the size, shape, position, colour or texture of a visual element displayed. Accuracy, i.e. higher scores, in the Test blocks was used as a metric to quantify learning and retention of myoelectric skills.

The experiment was performed on an HP EliteBook 840 G8 laptop computer (2.6 GHz i5-1145G7 CPU, 16 GB RAM, HP Inc., California, U.S.) which in this example was used as the computing apparatus 12 that comprises a processing resource 18. Real-time experimental software was implemented in Python using the AxoPy library. This processing resource 18 is used to receive measurements from the at least one sensor, apply a dimensionality reduction process, train a machine learning model, provide inputs comprising a set of features based on the measurements to a trained machine learning model and provide a real-time output to the user representing or determined from the determined distance. The dimensionality reduction process may be applied as part of the machine learning model or it may apply the dimensionality reduction to the input before it is sent to the machine learning model.

Eighteen adult limb-intact right-handed people (age range: 23-43) took part in this experiment. Three participants had previous experience with myoelectric control, of whom one was assigned to the LDA group, one to DistaNet-S, and one to DistaNet-SD. The assignment of participants to the experimental groups was governed by statistics.

Eight channels of EMG signals were recorded with two Trigno Quattro sensors (Delsys, USA). Sensors were placed on the forearm, c. 2cm below the elbow. Any number and type (electric, magnetic, capacitive, mechanical etc) of sensors may be used in other embodiments. A sensor based on any other technology that senses the activity of muscles may also be used. Starting from the extensor carpiulnaris muscle, the electrodes were spread around the limb equidistantly. At least one sensor was configured to output measured signals on at least one measurement channel. The at least one sensor may be configured to sense the activity of at least ten muscles of interest. In other embodiments, the at least one sensor may sense the activity of between 5 and 50 muscles of interest. In other embodiments, the at least one sensor may sense the activity of between 15 and 35 muscles of interest.

The band-pass filtered [10-500 Hz] EMG signals were sampled at 2000Hz. During the experiment, two features were extracted from each channel with a window size of 150 ms and an overlap of 100 ms. These features were waveform length (WL) and logvariance (log-var). In other embodiments, the features provided as inputs to the machine learning model may comprise at least one of amplitude, peak width or duration of a measurement signal. The WL feature quantifies the complexity of the signal waveform by calculating the cumulative length of the EMG signal in each window. The variance of the EMG signal indicates the contraction power in a non-linear way. The log-var feature linearises the variance . These features are calculated with

where x_t and x_i+1 stands for two neighbouring samples during a window of length N samples and

where p is the mean of sample values within analysis windows. Figure 7C shows example EMG signals and the extracted features. Our offline analyses indicated that the two features offer an acceptable trade-off between accuracy and computational complexity. But importantly, the methodology may be applied to any suitable type or number of EMG features.

Eighteen participants in three independent groups took part in the experiment over two days. Group one used a linear discriminant analysis (LDA) decoder to make an instructed grasp with their muscle activity and hold the grasp for a certain period for a score between 0-100%, which was displayed at the end of a trial in the practice blocks only. We denote this group with LDA. The second group used the DistaNet decoder 24. Also, they received the score feedback at the end of each trial that reflected how well they performed in the trial. We denote this group with DistaNet-S, where S stands for Score. The third group, that is DistaNet-SD, used the DistaNet decoder 24 and performed the same task but during practice trials, they received two feedback signals, namely, the score per trial and a visual representation of D on a computer screen with six bars. One embodiment of such a display is shown in Figure 6(c). Each bar represented the distance between where the user’s state in the feature space, after dimensionality reduction, and the centroids of the gesture classes as well as the rest class.

Figure 8A shows the averaged real-time performance results for the practice and test block for the three groups. On day one, Group LDA started at 0.868 ± 0.033 (Baseline) and decreased to 0.822 ± 0.057 (Test block 2), with a middle test (Test block 1) result of 0.790 ± 0.049. Group DistaNet-S started at 0.875 ± 0.048 and decreased to 0.871 ± 0.045 but with a higher Test block 1 result of 0.938 ± 0.021. Group DistaNet-SD started at 0.873 ± 0.023 and then increased to 0.940 ± 0.018 but with a lower Test block 1 result of 0.879 ± 0.030. On day two, Group LDA started at a low score of 0.654 ± 0.068 (Test block 3) and finished at 0.664 ± 0.042 in Test block 4. Group DistaNet-S started at 0.771 ± 0.070 but improved significantly to 0.935 ± 0.024. Group DistaNet-SD started at a relatively high score of 0.857 ± 0.038 and further improved to 0.936 ± 0.023 in Test block 4. Noticeably all groups exhibited a reduction in accuracy at the beginning of day 2. However, subjects in Group DistaNet-S and Group DistaNet-SD regained control quickly though at different rates.

Figure 8B provides a clearer view of performance retention across Test blocks 2, 3, and 4. Group DistaNet-SD was significantly better than Group LDA (Mann Whitney test, n = 6, p = 0.04) in Test Block 3 at the start of day 2. In Test Block 4 (the end of day 2), participants in both groups DistaNet-SD and DistaNet-S outperformed those in Group LDA (both with p = 0.002). No difference was observed between DistaNet-SD and DistaNet-S in Test Block 4. We had predicted that participants in Group DistaNet-SD would exhibit the highest performance in both Practice and Test blocks; because they had access to both the distance D feedback and the trial score, which helped form an internal model of that task required in the T est blocks where no feedback was presented. However, the improvement in the DistaNet-S group on the second day in both Test and Practice blocs was unexpected. Therefore, we asked whether the improvement in performance in group DistaNet-S is across all gestures.

Figure 9 is a graph showing average performance per group in a baseline block for the tests described in relation to Figures 7 and 8.

Figure 10A shows evidently that the improvement in the overall score comes primarily from the improvement in the decoding of the Pointer and Lateral gestures, as DistaNet- S shows a relatively stable performance in decoding gestures Open, Tripod, and Power.

Figure 10B and 10C depict the confusion matrices and the low-dimensional space, in which the ellipses show the distribution of the samples according to a Gaussian mixture model, at the beginning (top) and the end of day 2 (bottom).

Figure 11 shows a flowchart that illustrates in overview the steps taken to train the neural network or other model according to certain embodiments. In step 60, multi-channel data is measured from sensors 14 where in the data is preferably data representative of muscle activity. In step 62, a feature reduction process extracts features from the raw sensor data, preferably by using LDA.

In step 64, the dimensionality of the data is reduced so that it is converted to a continuous representation in a low-dimensional sub-space.

In step 66, pseudo-labels are created from the data to train the neural network. Optionally, these pseudo-labels are smoothed at this step.

In step 68, the neural network is trained to estimate distances between user state and class centroids in a low-dimensional sub-space.

Figure 12 shows a flowchart that illustrates the steps taken to obtain estimate of distance from a trained neural network or other model in certain embodiments.

In step 70, multi-channel data is measured from sensors 14 where in the data is preferably data representative of muscle activity.

In step 72, a feature reduction process extracts features from the raw sensor data, preferably by using LDA.

In step 74, a trained neural network estimates distances between user state and class centroids, which represent similarity between user gestures and predefined gestures that the neural network has been trained with.

It is a feature of various embodiments that the distances after dimensionality reduction still could describe a higher dimensional relationship between samples and labels with 2-dimensional representation. The distance reduces the dimensionality again into a folded poly-plane without losing the information between class labels and the sample. It not only reduces the confusion possibility but also increases the robustness of the feedback to users. Within each class, the distances are relatively more stable and continuous after filtering, which promises more intuitive and understandable feedback to users. Besides, between classes, the transformation of filtered distances is more continuous and coherent. The amplitude range of distances is more consistent, which means the sample is moving likely within a certain area between different label centroids in users’ imagination. From this point of view, the feedback (distance) has been presented in 2 dimensions with 3-dimensional information embedded for each time step.

The TCN’s core components in various embodiments are dilated and causal convolutions. Causal convolutions link the output at time step t with prior layer elements before t, emphasizing time-related input data. Unlike traditional methods that rely on many layers and large kernels for broader reception fields, TCN adopts dilated convolutions. These convolutions use expanding kernel sizes in each layer, effectively increasing the reception field. This enhances feature recognition and performance without requiring a bigger network, preventing issues like overfitting, vanishing gradients, and excessive computation.

In human motor control and sports and exercise literature, the dichotomy between product and process has been a focal point of discussion for decades. This dichotomy reflects two distinct approaches to understanding and evaluating motor learning. The product perspective centres on the final outcome of the movement, such as achieving a particular performance goal. Proponents of the product-oriented approach argue that setting specific targets and measuring performance against them is essential for assessing progress and determining success. This viewpoint often aligns with the competitive nature of sports, as achieving desired outcomes is often the ultimate objective. On the other hand, the process perspective places emphasis on the quality of movement execution, the underlying mechanics, and the developmental journey individuals undertake to improve their skills. Advocates of this viewpoint argue that focusing on the process allows people to build a strong foundation, refine their techniques, and enhance their overall motor control. The process-oriented approach encourages practitioners to prioritise factors such as body awareness and movement efficiency. This perspective recognises that mastery is a result of consistent practice, deliberate refinement, and a deep understanding of the intricate nuances of movement patterns.

While both process and product perspectives play roles in motor skill acquisition and retention, focusing on the process tends to lead to more robust and enduring learning outcomes. In other words, when the training prioritises understanding the underlying mechanics and dynamics, e.g. with feedback, refining movement patterns, and building a deeper level of skill acquisition can be achieved. This process-oriented approach involves cognitive engagement, which aids in encoding the motor skills into long-term memory, making them more likely to be retained over time. On the other hand, a sole focus on the product, while motivating in the short term, may lead to shortcuts and shortterm winning strategies. This can hinder the long-term retention of the acquired motor skills.

Academic research in the field of human-machine interfaces and upper-limb prosthetics has traditionally leaned towards a product-oriented learning approach, given the shortterm and laboratory-based nature of experiments. However, it is becoming increasingly evident that a shift towards a process-oriented perspective with the use of biofeedback is necessary to achieve more comprehensive and lasting outcomes. Embodiments described herein highlight the significance of grasp-specific distance biofeedback; emphasising that integrating biofeedback mechanisms can greatly enhance learning outcomes and promote the retention of motor skills. Nevertheless, the tension between these two perspectives is not a contradiction but rather a delicate balance. In practice, effective motor learning and skill acquisition often involve a symbiotic relationship between process (biofeedback) and product (score).

It is hypothesised herein that an intuitive and smooth biofeedback enhances the likelihood of retention and sustainable success, while a focus on achieving goals can provide motivation and a sense of accomplishment. In the experiments which spanned across two days, it was sought to examine the impact of grasp-specific distance biofeedback on the retention of motor skills. The findings showed a promising avenue for enhancing skill retention by utilising feedback tailored to specific grasps in the control space. Importantly, it is demonstrated that the DistaNet-SD group achieved higher levels of control compared to the LDA group on the second day, suggesting the potential of this approach for more effective myoelectric interface performance. Further investigations to determine how much time will pass before individuals begin to forget the decoder-based control, despite retaining the intuitive motor strategies they have developed is of potential interest

DistaNet-SD underscores the significance of process-oriented feedback in facilitating exceptional myoelectric interface performance. By providing users with an intuitive and coherent representation of the control manifold, DistaNet-SD bridges the gap between the myoelectric signal domain and the ideal grasp space within the control dimension. This seamless linkage enables the users to establish an innate understanding of the relationship between their physical movements and their positions within the control space, fostering an embodied comprehension of the motor skill execution process. Notably, DistaNet-SD presents users with both process-oriented feedback in the form of bars displayed on the screen and product-oriented information in the form of a score. This dual-layered feedback approach seems to play a significant role in achieving exceptional results. The training phase demonstrated the highest accuracy rates, further validating the effectiveness of the process-oriented information. During testing, DistaNet- SD exhibited the highest retention levels. Further studies of process-oriented feedback and learning involving deliberate and variable practice, which exposes individuals to different contexts and challenges related to the skill may be informative in future multiday experiments with myoelectric control users. Such an intentional variability can be achieved with gamification.

With regard to the DistaNet-S group, initially, on the first day, this group exhibited a performance level that aligned with expectations, given the smaller amount of feedback they received than the DistaNet-SD group (both score and distance information) and the also DistaNet-S being a more accurate decoder than the LDA decoder. However, on the second day the DistaNet-S participants displayed a distinct pattern: their performance scores dipped initially, not as drastically as the LDA group, but remarkably rebounded. This rapid recovery was followed by an observation during the final test block, wherein the DistaNet-S participants showcased a performance equivalent to that of the DistaNet- SD group, who had access to detailed feedback. This observation underscores the complex interplay between feedback provision and myoelectric skill retention, suggesting that if the noise in de-coding can be contained with an efficient decoder such as the decoder according to embodiments, certain intermediate levels of feedback, albeit product-feedback, can be enough to facilitate internalisation of myoelectric skill. Further exploration of such results can uncover insights into the cognitive and neuromuscular mechanisms governing myoelectric skill learning and retention.

The system in various embodiments may be applied, for example, in improving user movement, metaverse interfaces, robotics control, exoskeleton control, prosthetics, fatigue detection, monitoring of the body during exercise, rehabilitation and the like, and in training user movement for such activities. It is a feature of various embodiments that the system and method can be agnostic to the choice of dimensionality reduction method.

According to various embodiments there is provided a platform wherein the or a machine learning model is trained to determine distance in a latent space that is of lower dimensionality than a dimensionality of the or a set of input features, and the distance represents closeness of a position to a target gesture. The methods and systems may, for example, be applied to any time series data, and may enhance the retention of motor skills by reducing or minimizing classification noise.

Although description of particular embodiments has been provided above, it should be understood that these embodiments are illustrative only and that the claims are not limited to those embodiments. Those skilled in the art will be able to make modifications and alternatives to the described embodiments which are contemplated as falling within the scope of the appended claims. Each feature disclosed or illustrated in the present specification may be incorporated in any embodiment, whether alone or in any appropriate combination with any other feature disclosed or illustrated herein. In particular, one of ordinary skill in the art will understand that one or more of the features of the embodiments of the present disclosure described above with reference to the drawings may produce effects or provide advantages when used in isolation from one or more of the other features of the embodiments of the present disclosure and that different combinations of the features are possible other than the specific combinations of the features of the embodiments of the present disclosure described above.

Claims

1 . A system for monitoring user gestures or other movement or state, comprising: at least one sensor configured to perform measurements of muscle, nerve and/or brain activity of a user; a processing resource configured to: receive measurements from the at least one sensor; provide inputs comprising a set of features based on the measurements to a trained machine learning model, wherein the machine learning model is trained to determine distance in a latent space that is of lower dimensionality than a dimensionality of the set of input features, and the distance represents closeness of a position to a centroid or other position in the latent space that corresponds to a classification as a respective one a plurality of different gestures on which the machine learning model is trained, provide an output to the user representing or determined from the determined distance, thereby representing similarity between the gesture or other movement or state by the user and one or more gestures or other movements or states of interest, wherein the output varies as the muscle, nerve and/or brain activity of the user varies.

2. A system according to claim 1 , wherein the position whose closeness is determined comprises a position corresponding to features determined from measurements by the or a sensor.

3. A system according to claim 1 or 2, wherein the output comprises a real-time output.

4. A system according to any preceding claim, wherein the output represents similarity to a selected one of the plurality of gestures, movements or states or to each of the plurality of gestures, movements or states.

5. A system according to any preceding claim, wherein the output includes a plurality of time-varying outputs, each time-varying output representing similarity to a respective one of the plurality of gestures or other movements or states.

6. A system according to claim 5, wherein the time-varying output or each of the time-varying outputs comprises or represents a respective single numerical value that represents similarity to a corresponding gesture, movement or state.

7. A system according to claim 6, wherein each single numerical value represents distance in the latent space.

8. A system according to any preceding claim, wherein the distance in the latent space is represented by a suitable metric and/or comprises a Euclidian distance, a Jaccard distance, a Hamming distance, a Chebyshev distance or a Minkwoski distance.

9. A system according to any preceding claim, wherein the processing resource is configured to apply a dimensionality reduction process, either included in the machine learning model or as part of pre-processing step before input of the features to the machine learning model.

10. A system according to any preceding claim, wherein the dimensionality reduction process comprises application of at least one dimensionality reduction method and/or comprises at least one of Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Locally Linear Embedding (LLE), Laplacian eigenmaps (LE), t-distributed stochastic neighbour embedding (t-SNE), or Isomap.

11. A system according to any preceding claim, wherein the processing resource is configured to apply a smoothing process to smooth the real-time output, wherein the smoothing process is included in the machine learning model and/or as a postprocessing step applied to the output of the machine learning model.

12. A system according to claim 10, wherein the smoothing process comprises a Savitsky-Golay filter or other time-dependent filter.

13. A system according to any preceding claim, wherein the model comprises a temporal convolutional network (TCN) and is configured to receive features in the lower dimensionality latent space and to output estimated distances to the centroids or other positions of the classifications of the gestures, movements or states.

14. A system according to any preceding claim, wherein the output comprises at least one graphical indicator whose appearance varies depending on the determined distance(s).

15. A system according to claim 12, wherein the varying of appearance comprises at least one of varying size, shape, position, colour or texture.

16. A system according to any preceding claim, wherein the at least one sensor is configured to output measurement signals on a plurality of measurement channels and the inputs to the machine learning model are obtained based on the measurement signals on the plurality of channels.

17. A system according to any preceding claim, wherein the at least one sensor is configured to sense activity of at least 5 muscles of interest, optionally between 10 and 50 muscles of interest, optionally between 15 and 35 muscles of interest.

18. A system according to any preceding claim, wherein the set of features provided as inputs comprises at least one of an amplitude, peak width or duration of a measurement signal.

19. A system according to any preceding claim, wherein the at least one sensor comprises at least one muscle activity sensor or nerve activity sensor.

20. A system according to any preceding claim, wherein the gesture(s) comprise hand gesture(s) and/or facial gesture(s).

21 . A system according to any preceding claim, wherein the machine learning model is trained based on pseudo-labels for training examples of the different gestures, movements or states, the pseudo-labels representing or being determined from distance from the centroid or other position in the latent space for the gestures, movements or states.

22. A method of monitoring user movement, comprising: receiving measurements of muscle, nerve and/or brain activity of a user; providing inputs comprising a set of features based on the measurements to a trained machine learning model, wherein the machine learning model is trained to determine distance in a latent space that is of lower dimensionality than a dimensionality of the set of input features, and the distance represents closeness of a position to a centroid or other position in the latent space that corresponds to a classification as a respective one a plurality of different gestures, movements or states on which the machine learning model is trained; and providing an output to the user representing or determined from the determined distance, thereby representing similarity between the gesture, movement or state by the user and one or more gestures, movements or states of interest, wherein the output varies as the muscle, nerve and/or brain activity of the user varies.

23. A system for training a machine learning model for monitoring user gestures or other movement or state, comprising a processing resource configured to: receive a set of measurements of muscle, nerve and/or brain activity of a plurality of subjects, each measurement each corresponding to a known gesture, movement or state of a set of gestures, movements or states by one of the subjects; for each measurement provide an input comprising sets of features based on the measurement to a dimensionality reduction component that is configured to provide outputs of lower dimensionality than the sets of inputs; for each dimensionally-reduced output determine a label that represents for the closeness of a position to a centroid or other position in a latent space that corresponds to a classification of the gesture, movement or state to which the output corresponds; using the labels, train the machine learning model to determine from an inputthat comprises features determined from measurements, an output that comprises or represents distance to one or more gestures, movements or states of the set of gestures, movements or states.

24. A system according to claim 23, wherein the dimensionality reduction comprises application of at least one dimensionality reduction method and/or comprises at least one of Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Locally Linear Embedding (LLE), Laplacian eigenmaps (LE), t-distributed stochastic neighbour embedding (t-SNE), or Isomap.

25. A system according to claim 23 or 24, wherein the processing resource is configured to apply a smoothing process to smooth the time-varying dimensionally- reduced outputs and/or to smooth the time-varying labels.

26. A system according to claim 25, wherein the smoothing process comprises a Savitsky-Golay filter or other time-dependent filter.

27. A system according to any of claims 23 to 26, wherein the model comprises a temporal convolutional network (TCN).

28. A method of training a machine learning model, comprising: receiving a set of measurements of muscle, nerve and/or brain activity of a plurality of subjects, each measurement corresponding to a known gesture, movement or state of a set of gestures, movements or states by one of the subjects; for each measurement providing an input comprising sets of features based on the measurement to a dimensionality reduction component that is configured to provide outputs of lower dimensionality than the sets of inputs; for each dimensionally-reduced output determining a label that represents for the closeness of a position to a centroid or other position in a latent space that corresponds to a classification of the gesture, movement or state to which the output corresponds; using the labels, training the machine learning model to determine from an input that comprises features determined from measurements, an output that comprises or represents distance to one or more gestures, movements or states of the set of gestures, movements or states.

29. A computer program product comprising computer-readable instructions that are executable to perform a method according to claim 22 or 28.

30. A trained machine learning model, trained to determine distance in a latent space that is of lower dimensionality than a dimensionality of a set of input features, and the distance represents closeness of a position, for example corresponding to features determined from measurements by a sensor, to a centroid or other position in the latent space that corresponds to a classification as a respective one a plurality of different gestures, movements or states on which the machine learning model is trained.