WO2025062034A1

WO2025062034A1 - Feature extraction and encoding of spiking neural networks using convolutional neural network and trainable encoders for deployment in neuromorphic chips

Info

Publication number: WO2025062034A1
Application number: PCT/EP2024/076619
Authority: WO
Inventors: Rafael Javier PÉREZ BELIZÓN; Vasile TOMA-II; Sebastian Eusebiu Nagy
Original assignee: Innatera Nanosystems BV
Current assignee: Innatera Nanosystems BV
Priority date: 2023-09-21
Filing date: 2024-09-23
Publication date: 2025-03-27
Anticipated expiration: 2026-03-21
Also published as: TW202514436A

Abstract

A method of operating a neuromorphic processor comprising a convolutional neural network adapted for receiving an input signal and generating corresponding real values, an encoder connected to receive the real values and generate corresponding temporal-coded binary values using an encoding function, and a spiking neural network connected to receive the temporal-coded binary values and generate a corresponding spiking output signal. The method comprises computing a loss based on the spiking output signal, computing backpropagated gradients based on the loss, transforming the backpropagated gradients using a fully differentiable surrogate function that is a surrogate of the encoding function, and updating one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network based on the transformed gradients.

Description

Feature extraction and encoding of Spiking Neural Networks using Convolutional Neural Network and trainable encoders for deployment in neuromorphic chips TECHNICAL FIELD [001] The present invention relates to a neuromorphic processor, and more particularly, to systems and methods for joint training of a convolutional neural network and spiking neural network for deployment in neuromorphic chips such as in a neuromorphic processor. BACKGROUND [002] Convolutional Neural Networks (CNNs) are a type of feed-forward Artificial Neural Networks (ANNs) that make use of the convolution operation to repeatedly apply kernels to any kind of data. See Chollet, F. (2021), Deep learning with Python, Simon and Schuster. While combining CNNs with training algorithms such as Gradient Descent (GD) or other learning algorithms, they are an efficient way of identifying localised variations in the data that are relevant for a specific task, i.e. they are powerful data-driven feature extractors. CNNs are often employed along with a maximum pooling or average pooling operation. The former takes the maximum value of a kernel of a specified size and the latter averages all the values of the kernel. The combination of CNNs and pooling operations may be used to achieve a very desirable property, namely translation-invariance, i.e., a shift in the input data will not alter the outcome of the model. From now on, we will refer to the combined CNN-pooling operations as “CNN” or “CNNs” for simplicity. Additionally, CNN layers can be stacked on top of each other hierarchically, which allows them to be able to detect frequency-related or higher-level features. These benefits establish CNNs as translation-invariant, hierarchical, feature extractors, making them appealing to the embedded AI industry thanks to the development of efficient hardware accelerators. [003] Drawing inspiration from the brain, Spiking Neural Networks (SNNs), are neural networks which can receive binary inputs and produce binary outputs, in which each synapse- neuron complex can be conceptualised as a small processing unit with memory. The synapses scale the input events (spikes) by multiplying them by a weight value. Then, the synapse output is added to the neuron’s membrane potential. If a threshold is reached, the information (another spike) propagates to the next one or more synapses-neuron complex if there are any. If not, the information or a modified version of it remains in the neuron’s membrane potential until further spikes are received, scaled, and then summed to the membrane potential of the neuron. Modification includes a leak that can be e.g. linear or exponential, and/or a bias. Since the neuron keeps track of previous input spikes (events) that it has received, it has memory. Once the integrated events reach the neuron’s threshold, the memory (membrane potential) is reset and the process starts again immediately or after some time period (refractory period). The neurons of SNNs also are small processing units since their membrane potential has internal dynamics that can be characterised by, for example, dynamical systems theory. Their membrane potential follows certain physical principles that perform in-memory computation, i.e. modify the membrane potential. Some examples of in-memory processing would be, integration of the input events, having some sort of membrane potential decay function over time, the way the membrane potential is reset, the refractory period, etc. Stemming from these intrinsic characteristics, SNNs are recognised as a potent pattern recognition solution for embedded AI, since efficient hardware implementations exist, namely the neuromorphic chips. [004] Whilst SNNs are energy-efficient at performing event-driven pattern recognition tasks, they typically rely on preprocessing methods that are not automatically modified during SNN training. Traditional preprocessing methods are often power-hungry (for example spectrogram analysis like wavelet transforms) or restrict the use-case (like pass-band filters). There are several reasons for which these preprocessing techniques are typically employed, e.g. input data for SNNs must be encoded by an encoder, i.e. transformed from real values into temporal-coded binary events that span in time (i.e. spike trains), before being fed into the SNN. Examples of how encoders can be employed to create events out of real values would be any kind of temporal encoding, such as for example rate, group or precise spike-time encodings that are typically employed in the SNN domain (see Auge, Daniel, et al. "A survey of encoding techniques for signal processing in spiking neural networks." Neural Processing Letters 53.6 (2021): 4693-4710), or e.g. binarization techniques that are not commonly employed in the SNN domain such as e.g. the Straight Through Estimator (STE) (see Bengio, Yoshua, Nicholas Léonard, and Aaron Courville. "Estimating or propagating gradients through stochastic neurons for conditional computation." arXiv preprint arXiv:1308.3432 (2013)), the Binary Spatter Code (see Kanerva, P. (1996, July). Binary spatter-coding of ordered k-tuples. In International conference on artificial neural networks (pp. 869-873). Berlin, Heidelberg: Springer Berlin Heidelberg) or any others. Another reason is that the features to be detected in sensor data span relatively long time periods with respect to the time constants of the spiking neurons’ internal dynamics which are especially relevant to obtain good performance in e.g. leaky analog or mixed-signal chips. [005] While traditional data processing techniques perform well on a range of applications, to tailor them for a specific application is a time-consuming task. Additionally, if the preprocessing techniques require parameter selection, their performance is usually capped by the outcome of the parameter search for the preprocessing. Once the preprocessing parameters are selected, the same preprocessing block using those parameters is usually employed throughout the whole Machine Learning (ML) model training process, regardless of the error gradient on the tasks that are being performed. In the case of preprocessing for ML models with a temporal component, e.g. particular dynamics of in-memory computation in SNNs, the above preprocessing seldom automatically optimises for representing the pre-processed data as events with desirable temporal dynamics. [006] For instance, in optimization techniques like gradient descent, the preprocessing parameters remain static throughout the training process, even as the error gradient changes during each iteration. This fixed preprocessing can hinder performance, as the data representation may no longer be optimal for the evolving model. Similarly, in models with a temporal component where backpropagation through time (BPTT) is used to propagate errors across time steps, static preprocessing may fail to capture the temporal dependencies critical for effective learning. In the case of spiking neural networks (SNNs), the situation is even more complex, as these models rely on event-based temporal dynamics that typical preprocessing techniques do not account for, further limiting their performance. [007] For example, to decompose a Frequency Modulated Continuous Wave (FMCW) RADAR signal into its frequency components, a standard practice in the industry would be to compute its range Fast Fourier Transform (FFT). Although this would yield a precise depiction of the reflective objects in a room, finding optimal features and creating binary events for a task such as people counting will differ from a people tracking task. Furthermore, this task will be more difficult if the network analysing these events has some internal dynamics like SNNs. [008] Automated feature extraction and conversion to events can complement and strengthen an SNNs’ pattern recognition capabilities. The present invention implements this using CNNs because of their desirable properties described previously, and the system of the present invention comprises a forward path of CNN, encoder and SNN. Note that the term “encoder” is defined as being distinct from a CNN or SNN for the purposes of this description. An encoder transforms real input values (output by the CNN) into temporal-coded binary values (for input to the SNN). Note that under the given definitions of CNNs and encoders, CNNs are different from encoders since CNNs can be abstracted as functions that receive real values and output real values, whilst encoders receive real values, transform them, and output binary values. Even in the most extreme case in which the activation function of the CNNs is binary, e.g. when the activation function is Straight Through Estimator (STE), the encoder is the STE itself. Notwithstanding, SNNs that receive non-binary input values are not considered as encoders in the context of the invention even if they output binary numbers, since their parameters can be jointly optimised in conjunction with binary SNNs without problem. Furthermore, while CNNs and SNNs both comprise a network of interconnected neurons and synapses, an encoder as defined herein does not include neurons and synapses. [009] The state of the art has already employed CNNs together with SNNs in different ways. For example, Q. Xu et al., CSNN: an augmented spiking based framework with perceptron- inception, IJCAI 2018, describes a system in which a CNN is pre-trained with a fully-connected Artificial Neural Network (ANN), the ANN is then discarded and replaced by a temporal encoder and an SNN, and then the SNN is trained. S. Kim et al., C-DNN: A 24.5-85.8 TOPS/W complementary-deep-neural-network processor with heterogeneous CNN/SNN core architecture and forward-gradient-based sparsity generation, ISSCC 2023, describes using forward-gradient training for an SNN and backpropagation training of a CNN if the SNN forward-gradient is above a certain value. Other examples such as U.S. Patent Nos.11,227,210 and 11,989,645 propose the use of spiking CNNs, rather than combining CNNs with SNNs. [0010] Although CNNs are widely employed in the AI field, in previously known systems they are not trained together with SNNs within the same training flow. SUMMARY OF INVENTION [0011] A system and method are described herein for integrating the training of CNNs and SNNs in the same flow, optimising their deployment to the neuromorphic processor. The invention differs from the prior methods by using a CNN to derive an output feature vector for input to an SNN, the full pipeline being optimised for deployment to the neuromorphic processor at the same time. [0012] In a first aspect, the invention comprises method of deploying a pipeline to a neuromorphic processor and subsequently operating the neuromorphic processor. The pipeline and the neuromorphic processor comprising: a convolutional neural network adapted for receiving an input signal and generating corresponding real values; an encoder connected to receive the convolution result and generate corresponding temporal-coded binary values using an encoding function; and a spiking neural network connected to receive the temporal-coded binary values and generate a corresponding spiking output signal. The pipeline models at least a part of the neuromorphic processor. [0013] The method comprises: computing for the pipeline a loss based on the input signal and the spiking output signal; computing for the pipeline backpropagated gradients based on the loss; transforming the backpropagated gradients using a fully differentiable surrogate function that is a surrogate of the encoding function; updating for the pipeline one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network based on the transformed gradients; and deploying the one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network of the pipeline to the spiking neural network and convolutional neural network respectively of the neuromorphic processor. [0014] In an embodiment of the first aspect, the method may further comprise updating one or more parameters of the encoder of the pipeline based on the transformed gradients and subsequently deploying the updated one or more parameters of the encoder of the pipeline to the encoder of the neuromorphic processor. Note that the encoders may also have zero parameters that need to be updated according to the invention, as encoders may have a surrogate gradient. [0015] In an embodiment of the first aspect, the transformed gradients generated by the surrogate function comprise a continuous approximation of the backpropagated gradients. [0016] In an embodiment of the first aspect, the transformed gradients generated by the surrogate function comprise a continuous relaxation of the backpropagated gradients. [0017] In an embodiment of the first aspect, the loss is used to compute the backpropagated gradients for use in updating the parameters of the spiking neural network, the convolutional neural network and/or the encoder of the pipeline using the chain rule, preferably wherein the backpropagated gradients depend on the loss, a spiking output of the forward pass, previous inputs, and/or in-memory computations and dynamics of neurons of the spiking neural network. [0018] In an embodiment of the first aspect, the encoder is a temporal encoder, a rate encoder, a group encoder, a precise spike-time encoder, a thermometer encoder, and/or a straight- through-estimator encoder. [0019] In an embodiment of the first aspect, the parameters of each element of the pipeline are initialised before processing of the input begins, preferably wherein the initialisation is performed by drawing samples from certain distributions, e.g. the uniform or gaussian distribution, using genetic algorithms or other initialisation methods. [0020] In an embodiment of the first aspect, the parameters for the convolutional neural network comprise one or more of convolutional filters (weights), biases, fully connected layer weights, and/or batch normalization parameters; and/or the parameters for the encoder comprise weights and biases of encoder layers, and/or parameters of learnable transformations; and/or the parameters for the SNN comprise one or more of synaptic weights, membrane potential threshold, STDP parameters (if using spike-timing learning), and/or neuron and synapse dynamics parameters. [0021] In an embodiment of the first aspect, the method further comprises after the step of updating one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network of the neuromorphic processor based on the transformed gradients: the convolutional neural network of the neuromorphic processor receiving a sensor input signal and generating corresponding sensor real values to the sensor input signal; the encoder of the neuromorphic processor receiving the real values corresponding to the sensor input and generating corresponding temporal-coded sensor binary values using the encoding function; and the spiking neural network of the neuromorphic processor receiving the temporal- coded sensor binary values and generating a corresponding sensor spiking output signal; an inference module of the neuromorphic processor performing inference on the sensor input signal based on the sensor spiking output signal. [0022] In an embodiment of the first aspect, the sensor input signal is obtained from an image sensor, an optical sensor, a LiDAR sensor, a RADAR sensor, an inertial measurement sensor, accelerometer sensor, vibration sensor, gas sensor, a proximity sensor, an acoustic sensor, an electroencephalography sensor, an electromyography sensor, and/or an electrocardiography sensor. [0023] In a second aspect, the invention discloses a neuromorphic processor. The neuromorphic processor comprises: a convolutional neural network adapted for receiving an input signal and generating corresponding real values; an encoder connected to receive the real values and generate corresponding temporal-coded binary values using an encoding function; a spiking neural network connected to receive the temporal-coded binary values and generate a corresponding spiking output signal. [0024] One or more parameters of the spiking neural network and one or more parameters of the convolutional neural network are set by, for a pipeline that models at least the convolutional neural network, the encoder and the spiking neural network of the neuromorphic processor: computing for the pipeline a loss based on the input signal and the spiking output signal; computing for the pipeline backpropagated gradients based on the loss; transforming the backpropagated gradients using a fully differentiable surrogate function that is a surrogate of the encoding function; and updating for the pipeline one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network based on the transformed gradients; deploying the one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network of the pipeline to the spiking neural network and convolutional neural network respectively of the neuromorphic processor. [0025] In an embodiment of the second aspect, one or more parameters of the encoder of the neuromorphic processor may be set by updating and subsequently deploying one or more parameters of the encoder of the pipeline based on the transformed backpropagated gradients. Note that the encoders may also have zero parameters that need to be updated according to the invention, as encoders may have a surrogate gradient. [0026] In an embodiment of the second aspect, the transformed gradients generated by the surrogate function comprise a continuous approximation of the backpropagated gradients. [0027] In an embodiment of the second aspect, the transformed gradients generated by the surrogate function comprise a continuous relaxation of the backpropagated gradients. [0028] In an embodiment of the second aspect, the loss is used to compute the backpropagated gradients for use in updating the parameters of the spiking neural network, the convolutional neural network and/or the encoder of the pipeline using the chain rule, preferably wherein the backpropagated gradients depend on the loss, a spiking output of the forward pass, previous inputs, and/or in-memory computations and dynamics of neurons of the spiking neural network. [0029] In an embodiment of the second aspect, the encoder is a temporal encoder, a rate encoder, a group encoder, a precise spike-time encoder, a thermometer encoder, and/or a straight-through-estimator encoder. [0030] In an embodiment of the second aspect, the parameters of each element of the pipeline are initialised before processing of the input begins, preferably wherein the initialisation is performed by drawing samples from certain distributions, e.g. the uniform or gaussian distribution, using genetic algorithms or other initialisation methods. [0031] In an embodiment of the second aspect, the parameters for the convolutional neural network comprise one or more of convolutional filters (weights), biases, fully connected layer weights, and/or batch normalization parameters; and/or wherein parameters for the encoder comprise weights and biases of encoder layers, and/or parameters of learnable transformations; and/or wherein parameters for the SNN comprise one or more of synaptic weights, membrane potential threshold, STDP parameters (if using spike-timing learning), and/or neuron and synapse dynamics parameters. [0032] In an embodiment of the second aspect, the convolutional neural network of the neuromorphic processor is configured to receive a sensor input signal and to generate corresponding sensor real values to the sensor input signal; the encoder of the neuromorphic processor is configured to receive the real values corresponding to the sensor input and to generate corresponding temporal-coded sensor binary values using the encoding function; and the spiking neural network of the neuromorphic processor is configured to receive the temporal-coded sensor binary values and to generate a corresponding sensor spiking output signal; and wherein the neuromorphic processor furthermore comprises an inference module, wherein the inference module is configured to perform inference of the sensor input signal based on the sensor spiking output signal. [0033] In an embodiment of the second aspect, the sensor input signal is obtained from an image sensor, an optical sensor, a LiDAR sensor, a RADAR sensor, an inertial measurement sensor, accelerometer sensor, vibration sensor, gas sensor, a proximity sensor, an acoustic sensor, an electroencephalography sensor, an electromyography sensor, and/or an electrocardiography sensor; preferably wherein the neuromorphic processor is electrically or operatively connected to the sensor that obtained the sensor input signal. [0034] Since CNNs are a general feature extractor, by integrating CNNs and SNNs in the same training flow the invention enables the following. (1) Creation of learnt end-to-end pipelines, e.g. going from the sensor data to the SNNs output, or interleaving CNNs, encoders, SNNs blocks and other differentiable components in any order. (2) Trainable feature extraction for the SNN by the CNN. Features are extracted based on the joint optimisation of CNNs, encoders, and SNNs blocks, and any other differentiable parts in the pipeline. (3) The CNN extracted features are encoded based on the joint optimisation of CNNs, encoders, and SNNs blocks, and any other differentiable parts in the pipeline. (4) Identification of optimised encodings that are specific to the task, to the features extracted by the CNNs, to how the internal dynamics of the spiking neurons composing the SNNs, and to the overall system dynamics of the SNNs. (5) Ability to train CNNs and encoders to represent features significant to the temporal domain. (6) The invention allows for training the CNNs for the feature set of a specific application, allowing the power-performance of the feature extractor to be optimised or constrained for that application. (7) Parameters, including parameters of CNNs, encoders and SNNs can be constrained during the software optimisation process to target different hardware components and their specifications. For example, the joint training allows for optimisation of different quantisation schemes since CNNs and SNNs are typically deployed on different sorts of hardware accelerators. (8) Joint use of efficient CNN hardware accelerators in combination with the neuromorphic hardware, which can reduce chip area, latency and energy. (9) Shortening the embedding time with respect to tailored embedded solutions. (10) Creation of pipelines that include CNNs, encoders and SNNs blocks for inference such as e.g. classification, regression, segmentation, or generation tasks, among others. (11) Reduce the number of parameters and operations of preestablished of CNN - Fully Connected traditional networks for tasks that require pattern recognition. BRIEF DESCRIPTION OF THE DRAWINGS [0035] Embodiments will now be described, by way of example only, with reference to the accompanying drawings in which corresponding reference symbols indicate corresponding parts, and in which: [0036] FIG.1 is a schematic diagram of a simple neural network; [0037] FIG. 2 is a simplified schematic diagram of synapse connections implemented as a crossbar array; [0038] FIG. 3 is a schematic diagram of a forward pass and a backward pass in a neuromorphic processor comprising a pipeline with both CNNs and SNNs, showing both the simulation and the deployment of the pipeline; [0039] FIG.4 is a block diagram of a hardware implementation of the neuromorphic of FIG. 3; [0040] FIG. 5 is a block diagram illustrating an example of data flow in the neuromorphic processor of FIG.3 and 4; and [0041] FIG. 6 is a diagram illustrating data flow for a forward pass in a neuromorphic processor. DESCRIPTION OF EMBODIMENTS [0042] Hereinafter, certain embodiments will be described in further detail. It should be appreciated, however, that these embodiments are merely examples and should not be construed as limiting the scope of protection for the present disclosure. [0043] FIG.1 is a schematic diagram of a simple neural network 1 that may be implemented as a convolution neural network (CNN) or spiking neural network (SNN). The CNN or SNN in this example comprises an input layer 2 of neurons 10 (input neurons), a hidden layer 3 of neurons 11 (hidden neurons), and an output layer 4 of neurons 12 (output neurons). The input neurons are connected via synaptic elements 17 to hidden neurons 11, and the hidden neurons 11 are connected via synaptic elements 18 to output neurons 12. The output 9 of the overall system is generated by the last layer of output neurons 12 in the network. The output 9 of the output neurons 12 may then be passed to a decoder 13 (that may be implemented as a decoding layer of the network) which can pass the information forward for further processing or output to the user. [0044] The CNN/SNN shown in FIG.1 is illustrated with only three layers having very few neurons and synaptic elements for simplicity, but a practical network may have a very large number of layers, neurons and synaptic elements to achieve satisfactory performance. A practical implementation of a CNN/SNN may comprise any other number of neurons and synapses. The neural network may be implemented using hardware circuits or a combination of hardware and software or firmware, and may be implemented as a single integrated circuit and/or may be implemented as an embedded system. The neurons and synapses may be implemented using analog or digital circuits, or mixed signal circuits. [0045] The input neurons 10 receive input signals from a signal source 6, such as a sensor. The signals from source 6 may first be processed by an encoder 7 to generate signals suitable for input to the neural network. For example, the signal source 6 may generate real values which need to be transformed into binary spikes where the neural network is an SNN. [0046] The input neurons 10 generate neuron output signals 14 in the form of a real value for a CNN, or in the form of a train of spikes for an SNN. The neurons 11, 12 in subsequent layers 3, 4 receive the output signals generated by synapses 17, 18 and generate neuron output signals 15, 16 in the form of real values or spikes. Each neuron 11, 12 receives a synapse output signal from one or more of the synapses 17, 18, depending on the configured synaptic connections. For example, every neuron in one layer may be connected via synapses to every neuron in the following layer as shown in FIG. 1, or the network may be configured to make selective connections via the synapse between selected neurons of adjacent layers. Many different connectivities between neurons may be used in addition to those described, and may also include skip connections, highly recurrent liquid state machine architectures, etc. [0047] Each neuron 10, 11, 12 processes the received signals (input signals or synapse output signals) and generates a neuron output signal 14, 15, 16 as a function of the received input signals and the neuron’s current state. For an SNN, the neuron output signal will include a spike when the integrated value (referred to as the membrane potential) of the neuron reaches a predetermined threshold value. When the threshold is reached, the neuron fires, generating a spike (i.e. a voltage or current spike) at the neuron’s output, and reducing the membrane potential. Each neuron is thus configured to generate a neuron output signal 14, 15, 16 in the form of a spatio-temporal spike train. The neuron output signal 14, 15, 16 depends on the parameters of the neuron, such as the input gain, integration constant and threshold value. [0048] Each synaptic element 17, 18 (also referred to as a synapse) receives an output signal from one of the neurons 10, 11. The synapses 17, 18 amplify or attenuate the received output signal by a predetermined factor determined by their weight parameter, which is configurable. The weight of a synapse may be positive so that a synaptic output signal received from that synapse excites the neurons which receive the signal, raising their membrane potentials. The weight may be negative, which inhibits the neurons which receive a synaptic output from that synapse, potentially lowering their membrane potentials. Or the weight may be zero, which effectively removes the synaptic connection between the two neurons connected via the synapse. The weight for each synapse may be stored in a memory cell associated with the synapse. The values of all the weights in the network is known as the weight matrix, the weights typically determined by the network training process. [0049] FIG. 2 is a simplified schematic diagram of synapse connections implemented as a crossbar array. A crossbar design is an efficient way of implementing a reconfigurable neural network, especially when manufactured on an integrated circuit. The design in FIG.2 includes a rectangular array of synapses 17 used to interconnect two layers of the neural network, e.g. synapses 17 connecting neurons 10 on one side of the array to neurons 11 on another side, or synapses 18 connecting neurons 11 on one side of the array to neurons 12 on another side. In the embodiment in FIG. 2, neurons 10 are arranged in one column, each driving a row of synapses 17. The synapses 17 are connected in columns, with the outputs of all synapses 17 in a column added together and serving as the input to a neuron 11. By programming appropriate weights in the synapse array and correctly configuring the interconnect system, a wide variety of network topologies can be implemented. [0050] The neuromorphic processor of the present invention comprises one or more convolution neural networks (CNNs) and one or more spiking neural networks (SNNs). The training may happen on e.g. a PC, using a simulator of the neuromorphic hardware. The simulator may not have to fully capture the elements and dynamics of the hardware. The networks may be deployed onto the hardware after training. The training may also be performed on the neuromorphic processor. The invention provides for joint training of the CNNs and SNNs as described herein. After training the pipeline, the trained pipeline is deployed on the neuromorphic processor. [0051] The CNNs provide task-dependent, hierarchical, translation-invariant feature extraction from the input signal. An encoder is provided to transform the output from the CNNs representing the extracted features, into temporal-coded events for input to the SNNs. The SNNs provide efficient low-power inference, e.g. classification, of the extracted features. The combination thus provides an efficient neuromorphic processor for inference of input data, comprising e.g. classification, feature extraction and/or signal processing. [0052] The invention provides the computing paradigm for a “fully differentiable” flow to create the events based on the output produced by the CNNs. The term fully differentiable in this context means that the derivative or an approximation of the derivative exists at every point in its domain. Thus, fully differentiable includes finding approximate or surrogate gradients for the encoder if it is not fully differentiable per se. This requirement is necessary, since having a fully differentiable pipeline is a necessary condition to, for example, apply the chain rule to compute gradients when training using a gradient descent method. [0053] Joint training of the CNNs and SNNs enables the CNNs to be trained while taking into account internal dynamics of the SNNs. In prior methods, the CNN and SNN are trained independently, since they lack a differentiable encoder that can enable directly training CNNs and SNNs together. [0054] The input data typically originates from a sensor, and typically comprises real values, either analog (graded) values or a digital representation of analog values, e.g. two’s complement, signed-magnitude, etc. Whilst these digital representations are also binary, they do not take into account the spiking neurons’ dynamics, so trying to encode information using them would not take advantage of the SNNs capabilities. By creating a differentiable event encoding of the data, the neuromorphic processor of the present invention can use gradient descent (GD) or other learning algorithm to account for how these real-valued features should be translated into temporal-coded binary events for the SNN. [0055] From a higher-level perspective, by including CNNs, encoders and SNNs in the same training flow, the feature extraction and encoding changes depending on how well the SNNs perform according to a defined loss, the dynamics of the SNNs and vice versa. Consequently, this enables the CNNs to extract task-dependent translation-invariant features and for the SNNs to find sequential patterns of these features. Additionally, the necessity of creating a custom embedded solution for each of the highly specific preprocessing techniques and encodings decreases significantly. [0056] To better exemplify the method, an example using a typical gradient descent (GD) training flow is described below, i.e. having a forward pass through the processing pipeline and a backward pass for training. The forward pass generates an inference output based on the whole pipeline and the input. Then there is the computation of the loss, i.e. the function that quantifies the quality of the prediction of the pipeline model with respect to the ground truth. In most of the learning algorithms, including GD, the objective is to minimise the loss on average for all the provided data by updating the weights or other parameters of the model. In GD, this task is carried out by the backward pass, which computes the gradients of the parameters in the pipeline with respect to the loss, starting from the output node back to the first node by applying the chain-rule, and also updates their parameters. Hereunder a description is provided how the invention may be used in this setup, but the invention is not limited to the details of this example. [0057] FIG.3 is a schematic diagram of a neuromorphic processor 30 comprising a pipeline with one or more CNNs 31, an encoder 32, one or more SNNs 33, and a decoder 34. FIG. 3 illustrates a forward pass through the pipeline for processing the input 40, and backward pass for training the elements of the pipeline. The figure shows both a mathematical model or simulated version of the pipeline 30A, and a hardware implementation 30B that is coupled to a directly or indirectly to a sensor and which is used to perform inference on sensor input 36. Exemplary use cases for sensor data are described further below. [0058] The training on the mathematical model of the pipeline 30A can be performed in a simulation (e.g. on a computer that simulates the behaviour of the different hardware components that form the pipeline within a neuromorphic processor), or using the actual hardware, e.g. the hardware upon which the pipeline is implemented such as the neuromorphic processor 30. The model of the pipeline 30A describes the relevant parts of the neuromorphic processor that form and/or implement the pipeline. The neuromorphic processor can implement the pipeline using various accelerators that perform the functions of the different parts of the pipeline, such as for example CNN accelerators, SNN accelerators and/or encoder and decoder accelerators. [0059] The parameters of each element of the pipeline are initialised before processing of the input begins. This initialisation can be made by drawing samples from certain distributions e.g. the normal or uniform distribution, using genetic algorithms or any other standard initialisation methods. Forward pass [0060] The pipeline receives an n-dimensional input 40 and processes the input in a forward pass through the pipeline to identify one or more features of the input 40. [0061] The CNN 31S perform a convolution operation on the input data 40 to project the input data to a higher or smaller dimension, and extract features from the input data. The CNN 31S generate output representing the extracted features in the real domain, so the output is subsequently fed to the encoder 32S so that it can be processed by the SNN 33S. [0062] The encoder 32S takes the output values 41 (that are real values) and transforms them into binary events in time, i.e. a series of temporal-coded binary spikes 42, which are suitable for input to the SNN 33. [0063] The SNNs 33S integrate the input events 42 and generate a spiking output 43 which is fed into decoder 34S. The decoder 34S is an algorithmic block that converts the binary spiking output 43 of the SNNs 33S into real values 44. [0064] The output 44 generated by the decoder 34S is employed to compute the loss 45 using a loss function (for example, a mean squared error function). The loss 45 measures the quality of the inference performed by the pipeline with respect to the ground truth (e.g. the target for training with a labeled dataset). The output 44 may also be fed into another CNN or SNN depending on the application of the neuromorphic processor. Backward pass [0065] The trainable parameters of the CNN, encoder, decoder etc are trained by performing a backward pass of training data through the pipeline, for joint training of the CNN 31S and SNN 33S, and optionally also the encoder 32S and/or the decoder 34S. After the loss 45 is computed, in this example, backpropagation starts. Note that the embodiment described in this example employs a gradient descent optimization method, but other learning algorithms may be used instead. [0066] The loss 45 is used to compute backpropagated gradients 46 for use in updating the parameters of the SNN 33S and CNN 31S (and optionally the encoder 32S and/or the decoder 34S), using the chain rule. The backpropagated gradients may depend on the loss 45, the spiking output 44 of the forward pass, the previous inputs 40, the neurons’ in-memory computations and the neuron dynamics of the SNN 33S among other variables. [0067] One or more parameters of the SNN 33S may be updated based on the backpropagated gradients 46. The backpropagated gradients 46 are also used for updating one or more parameters of the CNN 31S and optionally also the encoder 32S. However, in order to use the backpropagated gradients 46 from the SNN 33S to update the CNN 31S and encoder 32S, the backpropagated gradients 46 are first transformed. This requires that the encoding function of the encoder 32S is fully differentiable, i.e. the encoder is a pipeline block that takes real values and transforms them into their binary representation while at the same time having or approximating a derivative for all of its domain. Note that also surrogate gradient functions may be applied to the SNNs. [0068] For implementation of this requirement, a surrogate function may be defined that is fully differentiable, i.e. has a derivative for all of its domain. The surrogate function is used to transform the backpropagated gradients 46 to derive transformed gradients 47, 48. The transformed gradients 47, 48 may then be used to update one or more parameters of the CNN 31S and optionally the encoder 32S. The transformed gradients 47, 48 may also be used to update one or more parameters of the SNN 33S. The transformed gradients 47, 48 generated by the surrogate function may comprise a continuous approximation of the backpropagated gradients 46, and the transformed gradients may be generated by a continuous relaxation of the backpropagated gradients. [0069] A simple example of a differentiable encoder is the Straight Through Estimator (STE). In the forward pass, the STE characterises a signal m(t) as given by: ^^^^ ^^^^ ^^^^m(t)) = U(m(t)) where U is the Heaviside function (a discontinuous function whose value is zero for negative numbers and one for positive numbers). Since the Heaviside function is not differentiable at ^^^^( ^^^^) = 0 and its derivative is zero in the rest of its domain, the backward pass of STE(m(t)) may be approximated by a surrogate function using the hardtanh function, defined as: ℎ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ℎ( ^^^^( ^^^^)) = −1 ⋅ ^^^^(− ^^^^( ^^^^) − 1) + ^^^^( ^^^^) ⋅ [ ^^^^( ^^^^( ^^^^) + 1) − ^^^^( ^^^^( ^^^^) − 1)] + 1 ⋅ ^^^^( ^^^^( ^^^^) − 1) ^^^^ ^^^^ ^^^^( ^^^^( ^^^^)) ≈ ^{^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^} ℎ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ℎ( ^^^^( ^^^^)) [0070] Hence, for an encoder 32 employing an STE encoding function, the backpropagated gradients 46 may be transformed for the backwards pass using the surrogate function: STE(m(t))’ ≈ ^{^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^} ℎ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ℎm(t))’

[0071] Another example for a simple thermometer encoding is described below. For example, the thermometer generates a positive integer value x that varies over a time period, as described the equation: Thermometer(x, t) ^^❑∑ ^{^} ^_^ ^{^} ^_^ ^{^} =^{^} 0 ^^^^( ^^^^ − ^^^^),∀ ^^^^ < ^^^^,∀ ^^^^ ∈ ^^^^^❑ where δ is the Dirac delta function, and T is a predefined maximum amount of time which is determined by the highest possible value of x. [0072] The function f(x, t) = Thermometer(x, t) is not differentiable, since there are not a set of differentiable operations that can be applied to the value x so that it becomes Thermometer(x, t), especially given that Dirac’s delta function is discrete and not continuous. However, an approximation of Thermometer(x, t) can be used as a surrogate function to provide a fully differentiable function:

where b a scalar value that adjusts the overall gradient and a(x, t) is a function that scales the gradient depending on constraints of the hardware such as minimum or maximum inter-spike interval, interactions between digital and/or analog components, the output values form the CNN, quantisation constraints, or other factors. Basically, this approximation is compressing gradient values from a time representation into a time-free representation, since the CNN does in general not have a notion of what time is. However, since the surrogate function is computed based on the temporal dynamics of the SNN, it allows for optimisation of the SNN 33S based on the values coming from the CNN 31S. [0073] Note that the invention is not limited to the two examples described above, but includes the use of other surrogate functions that provide a fully differentiable approximation of the encoding function of encoder 32S, for transforming the backpropagated gradients 46 to derive transformed gradients 47, 48. These examples show how training CNN and SNN jointly can benefit hardware optimisation , e.g. if the temporal dynamics are given by an analog SNN, this joint optimisation can be employed to adapt the CNN output to them. [0074] The use of the surrogate function enables the joint training of the pipeline, including the CNNS 31S and SNN 33S, and optionally the encoder 32S. The training of CNN 31S, for example updating the synapse weights of CNN 31S employing the transformed gradients 48 computed using the chain-rule, provides for improved feature extraction from the input 40 according to the computed loss 45 and all the previous steps. [0075] Note that training of the CNN and/or the encoder may result in the CNN or encoder generating outputs that fall outside of the data value boundaries supported by the hardware of the neuromorphic processor, the microcontroller and/or the system-on-chip. The trained pipeline enables joint optimisation of multiple networks for embedding, including cases where one or more networks are embedded on hardware with different constraints than the rest. This can be addressed, for example, by constraining the optimisation by changing the encoders’ surrogate functions to take the constraints into account throughout the joint training process. Hardware specifications such as data types, quantization, data ranges, data throughput can be taken into account when training. This results in an end-to-end optimised pipeline, making the most of the temporal dynamics of the SNN, while also matching the hardware specifications. [0076] Going back to the RADAR example, because of their hierarchical property, CNNs can replace the FFT to extract frequency-dependent features depending on the task that one is targeting. A differentiable encoder creates events out of the CNNs’ output, events which are subsequently fed into the SNNs. Since the CNNs and the SNNs are trained at the same time, the pipeline of the invention not only optimises feature extraction, but also feature encoding, and then the SNNs integrate these sequences of encoded features in time to carry out the inference, e.g. classification, task. If, for example, the frequency of the events exceeds the boundaries supported by the neuromorphic hardware, then the training optimisation may be constrained to make them fall in the boundaries or dynamically scale these events so that they match the hardware specifications, by changing the encoder’s surrogate function. Therefore, the same CNNs-encoders-SNNs pipeline is able to detect a counting task or a tracking task in a fully data-driven manner, taking into account the SNNs’ temporal dynamics. The same pipeline can be used, and trained end-to-end for a specific purpose, without the need for time- consuming manual adjustment of the traditionally non-trainable stages. For example, the applications people counting and people tracking seem similar, but require a different pipeline to produce correct result. With end-to-end training, this is quite easily achieved. [0077] The CNNs’ weights are updated, employing the gradients computed using the chain- rule, improving the feature extraction according to the loss and all the previous steps. Hardware deployment [0078] The CNN weights, the encoder and the SNN weights are deployed in the chip. The SNNs 33, which if e.g. are analog, will have certain physical parameters that have been modelled prior to simulation, are being fed values by the encoder which are adapted to those parameters. Example of how they are adapted could be, by constraining the input/output in the encoder by modifying its surrogate and by the CNN having trained taking into account those constraints and physical modelling of the SNNs 33. [0079] The hardware pipeline comprises one or multiple sensors, one or multiple CNNs 31, one or multiple encoders 32, one or multiple SNNs 33, one or multiple decoders 34, and an inference 38, e.g. classification, result. The inference process begins with data acquisition of sensor input data 36, either directly from a sensor or from sensor input data stored in memory. The sensors may be one or more cameras, radars, microphones, or other devices that collect input data. This data can take various forms, like images, audio signals, or physical measurements, depending on the sensor type. The sensory input may be pre-processed or not, and contains a wide array of features, many of which may not be relevant for inference. To address this, the data first undergoes a feature extraction process using one or multiple CNNs. [0080] The CNN 31 processes the sensor data by applying convolutional filters, which identify and extract important patterns and features. These filters can detect low-level features such as edges in images or frequency components in audio. The CNN outputs a set of feature maps or feature vectors representing the extracted features in the real domain, which summarize the key characteristics of the original input data in a more compact form. [0081] Once the essential features are extracted, they are passed into an encoder 32. The encoder transforms the feature vectors into a format suitable for the next stage of processing by the SNN 33. After processing by the SNN, the resulting spike patterns are passed to a decoder 34, which converts the spiking signals back into a more interpretable format, often reconstructing feature representations that are more understandable to downstream processes. The decoder essentially reverses the encoding process, translating the sparse, spike-based information back into dense feature representations or activations that can be further analysed. [0082] Finally, the processed data is passed to an inference module 38, which interprets the decoded signals and e.g. assigns the input data to one of several predefined classes. For other types of tasks, the inference module can output e.g. a scalar value for regression tasks. The inference module’s decision is based on the output from the SNN. The final output is an inference result, for example a classification result, which could be a label, category, or probability distribution, depending on the specific application. This end-to-end pipeline allows for efficient processing of sensor data, enabling robust inference of complex inputs. [0083] Note that this is only an example and any amount of CNNs 31, encoders 32 or SNNs 33 can be employed in the deployment. Also, their order can be interchanged, which allows for more complex architectures such as e.g. autoencoders or tasks, e.g. regression. Hardware architecture to efficiently deploy CNN - SNN architectures. [0084] The described flow may entail starting with the modelling of, mainly, the SNN 33 physical properties, which are normally given by a differential equation or direct measurements in case of analog SNNs. Furthermore, the quantisation that would be present in the synaptic part of the synapse-neuron complex may be taken into account. [0085] One may model the input/output constraints within the encoders, but in a way so that these constraints are present in the backward pass only, allowing the training to converge to a solution that does not hit these constraints. [0086] Training with the methods already presented, given the hardware constraints, is carried out allowing for the already mentioned benefits and others such as e.g. allowing to optimise for sparser encodings of the CNNs’ features, hence making the whole system reduce dynamic power, and in essence more efficient. Whilst training is employed in software, it can be used to optimise the hardware’s forward pass. [0087] FIG.4 is a block diagram of a possible hardware implementation of a neuromorphic processor 30 enabling joint training of CNN 31 and SNN 33 as described above. The architecture presented below includes the previously discussed computational blocks, i.e. hardware accelerators implementing the CNN 31, encoder 32, SNN 33, and decoder 34. These communicate via the interconnect 52 with a general-purpose microprocessor 53, memory 54, peripheral interfaces 55, direct memory access (DMA) 56 and hardware barriers 57. Input data (e.g. sensor data) is provided via input/output 58 to the peripheral interfaces 55. [0088] The microprocessor 53 acts as an orchestrator of data movements, between internal computational elements 31-34 and external data acquisition or communication, and manipulations that are not supported or optimal for the other accelerators present on chip. This provides for flexibility and efficiency. [0089] Flexibility is enabled by allowing for arbitrary data movements through the main memory 54. For example, the output from CNN 31 can be transformed by encoder 32 directly on its path to the SNN 33, or the output can be sent to main memory 54, buffered or otherwise processed by the microprocessor 53, and then sent as input to the SNN 33. [0090] Efficiency is provided through multiple means. Firstly, the output of the CNN 31 is a relatively compressed data representation – it represents information at a higher level of abstraction, that is more separable than produced by earlier layers. Secondly, the SNN 33 natively consumes and produces events which are sparse in time and space. [0091] For these reasons, the benefits of the design lie with the interconnect 52 and the data manipulation operations performed between the CNN 31 and SNN 33. Various implementations of the individual blocks (whether digital, analog mixed-signal, in material compute, memristors or other novel methodologies) or variations in physical construction (system-on-chip, system-in-package, system-on-module etc.) may be accommodated without consequence to the operation of the system. Different scale systems may benefit from physically separating the computational blocks on different systems connected through a network-level interconnect. Such a system (or node) could have distributed CNNs orchestrated locally by one or multiple microprocessors, that would then exchange information between each other and one or more SNN systems. [0092] The input (sensor) data is received via the peripheral interfaces 55 and written into the main memory 54 by the DMA 56. The CNN 31 is configured either statically or dynamically, with parameters stored in memory, on or off chip, by the microprocessor 53. It is fed input data from the main memory 54, and its outputs are either sent to the encoder 32 so that they are transformed into events or sent to memory 54, processed (e.g. serially or in parallel) by the microprocessor 53 to generate the events, and subsequently sent to the SNN 31. [0093] The hardware barriers 57 may be used to modulate the data transfer from the DMA 56 to the encoder 32. This arbitration is based on the availability of the data in the DMA 56 and on whether the encoder 32 needs to finish preexisting processes. This interaction involving the hardware barriers 57, the encoder 32, the CNN 31 and the DMA 56 is depicted in FIG.5 as a feedback loop, since if the encoder 32 is not ready, the DMA 56 may place the data back in the SRAM of the CNN 31. [0094] The SNN 33 is similarly configured either statically or dynamically, with parameters stored in memory, on or off chip, by the microprocessor 53. Once the SNN 33 receives the encoded input events, it performs an efficient temporal integration of the spatial features extracted by the CNN 31 from the input signal. The output of the SNN 33 is decoded, either by a dedicated decoder 34 or in software by the microprocessor 53. [0095] FIG. 5 is a block diagram illustrating an example of how the data can flow in the neuromorphic processor 30 of the FIG.3 and 4 embodiment. [0096] The figure is divided into two parts: a hardware side 60B and a simulation side 60A. The simulation side shows a schematic diagram of a forward pass and a backward pass in a simulated neuromorphic processor comprising a pipeline with both CNNs and SNNs. [0097] As mentioned before, in the forward pass the pipeline receives an n-dimensional input 40 and processes the input in a forward pass through the pipeline to identify one or more features of the input 40. The CNN 31S perform a convolution operation on the input data 40 to project the input data to a higher or smaller dimension, and extract features from the input data. The CNN 31S generates output 41 representing the extracted features in the real domain, so the output 41 is subsequently fed to the encoder 32S so that it can be processed by the SNN 33S. The encoder 32S takes the output values 41 (that are real values) and transforms them into binary events in time, i.e. a series of temporal-coded binary spikes 42, which are suitable for input to the SNN 33S. The SNNs 33S integrate the input events 42 and generate a spiking output 43 which is fed into decoder 34S. The decoder 34S is an algorithmic block that converts the binary spiking output 43 of the SNNs 33S into real values 44. The output 44 generated by the decoder 34 is employed to compute the loss 45 using a loss function (for example, a mean squared error function). [0098] In the backward pass, the pipeline is trained by performing a backward pass of training data through the pipeline, for joint training of the CNN 31S and SNN 33S, and optionally also the encoder 32S. After the loss 45 is computed, in this example, backpropagation starts. The loss 45 is used to compute backpropagated gradients 46 for use in updating the parameters of the SNN 33S and CNN 31S (and optionally the encoder 32S and/or the decoder), using the chain rule as described above. [0099] The obtained parameters of the SNN, CNN, and the encoder and other parts of the pipeline can be exported (see arrow denoted A) to memory 54 comprised in the hardware implementation of a neuromorphic processor 30, as also exemplary shown above in relation to FIG.5. [00100] Parameters for the CNN can comprise for example convolutional filters (weights), biases, fully connected layer weights, and/or batch normalization parameters. Parameters for the encoder can comprise for example weights and biases of encoder layers, and/or parameters of learnable transformations. Parameters for the SNN can comprise for example synaptic weights, membrane potential threshold, STDP parameters (if using spike-timing learning), and/or neuron and synapse dynamics parameters. [00101] Parameters of the SNN in the simulated pipeline can be set based on a modelling of the physical dynamics of the SNN comprised in the hardware implementation. Likewise input constraints of the simulated encoder can be modelled based on the encoder accelerators present in the hardware implementation. [00102] The hardware pipeline is the same as discussed for FIG. 3, with more details given about an exemplary implementation. [00103] First the hardware is configured and initialised (see box B) by the microprocessor 53, by obtaining the obtained parameters from memory 54, obtained through the forwards and backwards pass through the simulated pipeline form memory, and initialising e.g. the CNN 31, SNN 33, encoders 32, and decoders 34 on the basis of the obtained parameters. For example, in the SNN 33 the weights of the synapses are set according to the obtained weights during the forwards and backwards pass. [00104] The hardware pipeline may comprise a sensor, one or multiple CNNs, one or multiple encoders, one or multiple SNNs, one or multiple decoders, and an inference module 38. The inference process begins with data acquisition of sensor input data 36, either directly from a sensor or from sensor input data stored in memory 54. The data first undergoes a feature extraction process using one or multiple CNNs. [00105] The CNN 31 processes the raw sensor data by applying convolutional filters, which identify and extract important patterns and features. The CNN outputs a set of feature maps or feature vectors representing the extracted features in the real domain, which summarize the key characteristics of the original input data in a more compact form. [00106] Once the essential features are extracted, they may be passed into an encoder 32. The encoder transforms the feature vectors into a format suitable for the next stage of processing by the SNN 33. After processing by the SNN, the resulting spike patterns are passed to a decoder 34. The decoder essentially reverses the encoding process, translating the sparse, spike-based information back into dense feature representations or activations that can be further analysed. [00107] Finally, the decoded data is interpreted depending on the kind of task that is being carried out, e.g. classification, regression, or other tasks. This may be done in an inference module 38, which receives the decoded data and performs inference on the decoded data. [00108] The Direct Memory Access (DMA) 56 may play a role in efficient data transfer within the pipelines involving sensors, CNNs, SNNs and the encoders and decoders. The DMA 56 can optimize how data moves between components. Without DMA, the microprocessor (e.g. CPU or GPU) typically manages the movement of data between the sensor (or any peripheral) and the memory. This requires the microprocessor’s intervention, which consumes processing cycles and may create a bottleneck, especially with large datasets like images or sensor streams. With DMA, the DMA 56 allows input sensor data 36 to be transferred directly from the sensor to memory 54 (e.g., RAM or GPU memory) without involving the microprocessor for every transfer. This allows the microprocessor to focus on more important tasks, like running the CNN or SNN computations. [00109] As mentioned before, hardware barriers 57 can be used in the hardware implementation. Hardware barriers are mechanisms that help synchronize and control the execution of tasks, particularly when different hardware components are working concurrently. In particular, hardware barriers are synchronization points that ensure proper ordering and timing of operations across different hardware units or components. They may prevent e.g. memory inconsistencies by forcing components to wait until certain conditions are met before proceeding with their tasks, such as to ensure that memory access and execution orders are respected. In multi-processing environments, where data flows between components (e.g., DMA, CPUs, GPUs, and neural network accelerators), hardware barriers may also prevent inconsistencies by ensuring that data flows in a proper order through the pipeline. [00110] After the CNN 31 processes data and passes feature maps to the encoder or SNN, it may use DMA 56 to move data between memory 54 and accelerators. If the next stage of the pipeline accesses memory before the data is fully written or transferred, it could lead to memory hazards (such as reading stale or incomplete data). A hardware barrier 57 ensures that the feature maps are fully written to memory or transferred before the next component (e.g., the encoder or SNN) starts reading or processing that data. This may prevent race conditions and ensures that each stage in the pipeline receives the correct input. The involved DMAs let the hardware barrier know that they have data available for the SNN and/or encoders, and the SNN and/or encoders can let the hardware barriers know that they are ready to receive data. The hardware barriers in each of these checks then determine (see checks C and D) whether data is available for transfer to the next step (e.g. SNN or encoder) of the pipeline (if so, then Y is followed). These two hardware barrier loops are exemplary shown in FIG.5 [00111] Hardware barriers 57 can also be used in other manners within the hardware implementation, these are not shown explicitly in FIG.5. [00112] For example, after sensor data is transferred to memory via DMA, the CNN must wait until the transfer is complete before it can start processing the data. A hardware barrier in this situation can ensure that the CNN does not start processing the data until the DMA transfer is fully completed. This prevents a situation where the CNN tries to process incomplete or corrupted data due to premature access. [00113] As another example, neural networks often run in parallel across multiple hardware units (e.g., CPU, GPU, or neural accelerators). Intermediate data (such as weights, feature maps, or spike patterns) may be shared across these units. Without proper synchronization, one processing unit might modify data while another unit is still reading or writing to the same location. Hardware barriers may ensure that memory updates by one hardware unit are visible and complete before other hardware units can access the data. This may be critical in ensuring memory consistency in parallel processing. This prevents data corruption and ensures reliable and correct execution of parallel tasks. [00114] Furthermore, in real-time sensor-driven systems (such as robotics, medical devices, or autonomous vehicles), minimizing latency is critical. For example, while the SNN is processing the current batch of data, the CNN might already be working on the next batch, with DMA handling input data from the sensor. In this case, hardware barriers can ensure that different stages in the pipeline (CNN, SNN, decoder, etc.) are synchronized correctly without causing data conflicts or unnecessary delays, ensuring optimal performance in real-time operations. This allows for smooth, predictable data flow and real-time processing. [00115] As can be seen from the examples above, hardware barriers may be implemented at different levels of the hardware implementation depending on the architecture and pipeline complexity. For example, hardware barriers active in memory may ensure that all previous memory operations (reads/writes) are completed before any subsequent memory operations are executed. This may be important in managing e.g. DMA transfers, where data needs to be completely written to or read from memory before moving to the next stage. Hardware barriers may also be active on the execution/pipeline level and may ensure that the current stage of computation is fully finished before the next stage starts. In a neural network pipeline, it prevents stages (like CNN, SNN, or decoder) from overlapping in ways that cause errors or data corruption. Problems and limitations solved or overcome by the invention [00116] Combining CNNs and SNNs (and optionally encoders and/or decoders) in the same training flow yields the advantages that have already been presented in the previous section. This overcomes the problems including the following. (1) Lack of task-specificity and the related optimisation in the standard SNN encoders. Feature representation as events can require a high number of operations at the encoding stage. (2) Traditional CNN-fully connected or CNN-only architectures are suboptimal, e.g. require too many operations, and a method for joint training with a different network type like SNN would be beneficial. (3) Tailored preprocessing, dimensionality reduction and feature extraction embedding is expensive in terms of time to implement, latency and power-efficiency. (4) Embedding of different encoders to translate real values into binary spikes is suboptimal for the extracted features and for the hardware’s neuron model and SNN dynamics. (5) Training of networks within hardware and/or use case constraints is likely to be suboptimal when the pipeline blocks are trained independently. (6) The search for the optimal parameter combinations for different pipeline elements is time and computation intensive, and identification of the optimal values is harder without using loss function spanning multiple pipeline elements. (7) Unconstrained input events’ frequencies do not fit the hardware constraints. (8) Even if features are encoded as events, the events do not necessarily produce a desirable spiking behaviour in the SNNs; the encoded time-series of events may not account for the SNN dynamics, e.g. they decay or the reset of the membrane potential. (9) Translational-invariant features are key for the SNNs for the task that is intended to be solved. (10) Time integration of the CNNs high-level features is important for the task to be solved. Applications of the invention [00117] From a general perspective, the applications stemming from these inventions are chips that include a neuromorphic processor, since it allows for a data-driven way of extracting features and encoding data in a specific way which only SNNs can benefit from. [00118] Due to how general CNNs and SNNs can be, there are a wide range of applications in which they can be employed together. General usage applications include industrial maintenance, internet of things, wearable devices, object or people tracking, scene classification, pattern generation, gesture recognition, simulation of biological systems, navigation tasks, object or people detection or classification, object or people segmentation, key-word spotting, signal or image processing, etc. Specific usage applications include image and video recognition, image classification, image segmentation, image and language generation, medical image analysis, biomedical signal processing, LiDAR signal processing, RADAR signal processing, RADAR human presence detection, RADAR gesture recognition, audio scene classification, audio signal processing, natural language processing, brain– computer interfaces, automated driving, Inertial Measurement Unit Human Activity recognition, bearing fault diagnosis, etc. [00119] Following are some examples to applications of the neuromorphic processor 30 described above. Human Activity Recognition with Inertial Measurement Unit (IMU) [00120] An IMU is a composite of accelerator, gyroscope and sometimes magnetometer sensors, which are capable of measuring acceleration, angular velocity and orientation respectively. IMUs, if worn by a human using some wearable (such as a smartwatch) can provide enough insight to detect or classify different tasks that are carried out by humans on a daily basis. IMU data is highly temporal, meaning that not only is its information contained by an event happening, but by the event’s relative timing with respect to other events. As previously established, this scenario is something at which SNNs are good at. [00121] A neuromorphic processor 30 with a pipeline comprising CNNs, the Straight Through Estimator (STE) as a differentiable encoder, and SNNs may be used to detect classes of human activity from IMU input data. The data used is the dataset provided in V. Fra et al., Human activity recognition: suitability of a neuromorphic approach for on-edge AIoT applications, Neuromorphic Computing and Engineering, 2(1), 014006 (2022). The data contains six channels (3 accelerometers and 3 gyroscopes), with 40 samples each. This dataset contains seven classes, which are described in Table 1.

Table 1. Classes of dataset for Human Activity Recognition. [00122] The pipeline’s input comprises 40 data samples from 6 sensors (3 accelerometers and 3 gyroscopes. The data is quantized to int8 then fed to the CNN, which is used as a preprocessing step for the SNN. The CNN may comprise 3 layers, with Rectified Linear Unit (ReLU) activations between them and different kernel sizes. Zero padding is used such that the 40x6 data format remains intact, but the output depth of the CNN is 10, i.e. the output size of the CNN is 40x6x10. The spatial features of the CNN are then reshaped along the last axis, yielding a data shape of 40x60. The data is then binarized using the STE encoding function as described previously, with the addition of a softmax layer in between the CNN and the encoder. The binary input spikes from the encoder are fed to the SNN, and the output spikes of the SNN are then fed to the decoder, which in this case is just the summation of the output spikes, and posterior conversion into probabilities. Depending on the result of the decoder, one of the 7 classes contained in the dataset (Table 1) is predicted. [00123] The data flow for the forward pass through the pipeline 60 is illustrated in FIG.6. This pipeline can be implemented automatically in neuromorphic hardware as described herein, accelerating the embedding process and removing the need for intensive preprocessing. [00124] The IMU sensors 61 provide sensor data 62, which is fed into the CNN 63. While training the pipeline, the loss function is minimised using GD. The weight update is carried out based on the backpropagated loss, through the decoder 66, and the SNN 65. It reaches the STE encoder 64, which propagates the gradient that the SNN 65 had within a certain range as described previously. Conceptually, the temporal gradients that had been produced in the SNN 65 are propagated back through the STE to the softmax function and later to the CNN 63, giving them a sense of their dynamics. The weight update continues until it reaches the first CNN layer. The weight update and the whole process is controlled by an optimizer, which in this case was Adam. All of the former procedure is iteratively carried out in batches, for several epochs, like it is often done in normal Machine Learning (ML). The input size I is shown for each part of the pipeline. Audio Scene Classification [00125] In this example, the audio is 1 second long with a sampling frequency of 8kHz, producing 8000 float32 values. The dataset is the Low-Complexity Acoustic Scene Classification with Multiple Devices, using only device A from the dataset. The 10s audio data is split into 10 x 1 second long audios, as part of the pipeline design choice. [00126] The audio is resampled at 2kHz (2000 samples) by selecting every Nth value (N=8000 / 2000=4). The maximum value out of the 2000 is selected through a O(N) search, then the values are normalised, by doing a float division with the previously found maximum value. All the values are then rescaled to [-127, 127] int8 values (float32 multiplication by 128, casting to int8). [00127] The 2000 int8 values are now fed to a 1D CNN (actually a 2D CNN with one dimension always equal to 1). The CNN output is int8, with a shape of 8x1x8. The CNN has 2,056 parameters and requires ~65k MACs. He CNN is composed of a Convolution followed by a BatchNormalization, 2 pairs of Relu + Convolution, BatchNormalization, Convolution, and BatchNormalization. [00128] The next step in the pipeline is the encoding, where the 8x1x8 int8 values are reshaped to 64 int8 values, which are then encoded into spikes, by using a Thermometer Encoder. After the encoding, there is an analog SNN simulated in hardware for 32 timesteps. The CNN int8 output values are cast back to float32, multiplied by a float32 constant, and then transformed to the 32x64 SNN input spikes (saved in a 32x2 uint32 array) by doing a few float32 additions, subtractions and multiplications. [00129] The thermometer encoder encodes values to spikes, having a value range and number of timesteps. The higher the input value is with respect to the configured value range, the more consecutive spikes will be generated by the encoder, similarly to a real life thermometer. These input spikes are then fed to the SNN and inference is performed. The encoder is differentiable as described previously, allowing for simultaneous training of the CNN and the analog SNN, hence allowing a better feature encoding for the analog SNN since its physical properties were taken into account in that process. [00130] The SNN output spikes are decoded through summation of the output spikes. The number of spikes per output class is computed, and the predicted class is the one with the most number of spikes (and lower index, in case of a tie between classes). [00131] In the training, the end goal is the minimization of the loss function, performed with the help of gradient descent. The backward pass of the gradients starts with the decoding, through the SNN with the help of the surrogate gradient descent, and finally reaches the encoder. The differentiable encoder allows for gradients to reach from the SNN to the CNN, while extracting the temporal information and dynamics produced by the SNN. This process continues through the CNN until the very first layer of the forward pass has been reached. The whole training process is supervised by an optimizer, namely Adamax in this case. [00132] Note that features of any of the embodiments disclosed herein may be combined in an appropriate manner. CLAUSES [00133] As clause 1, a method of operating a neuromorphic processor comprising: a convolutional neural network adapted for receiving an input signal and generating corresponding real values; an encoder connected to receive the real values and generate corresponding temporal-coded binary values using an encoding function; and a spiking neural network connected to receive the temporal-coded binary values and generate a corresponding spiking output signal; the method comprising: computing a loss based on the spiking output signal; computing backpropagated gradients based on the loss; transforming the backpropagated gradients using a fully differentiable surrogate function that is a surrogate of the encoding function; and updating one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network based on the transformed gradients. [00134] As clause 2, the method of clause 1, further comprising updating one or more parameters of the encoder based on the transformed gradients. [00135] As clause 3, the method of clause 1 or 2, wherein the transformed gradients generated by the surrogate function comprise a continuous approximation of the backpropagated gradients. [00136] As clause 4, the method of any one of the preceding clauses, wherein the transformed gradients generated by the surrogate function comprise a continuous relaxation of the backpropagated gradients. [00137] As clause 5, a neuromorphic processor comprising: a convolutional neural network adapted for receiving an input signal and generating corresponding real values; an encoder connected to receive the real values and generate corresponding temporal-coded binary values using an encoding function; a spiking neural network connected to receive the temporal-coded binary values and generate a corresponding spiking output signal; and one or more processors configured to: compute a loss based on the spiking output signal; compute backpropagated gradients based on the loss; transform the backpropagated gradients using a fully differentiable surrogate function that is a surrogate of the encoding function; and update one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network based on the transformed gradients. [00138] As clause 6, the neuromorphic processor of clause 5, wherein the one or more processors configured to update one or more parameters of the encoder based on the transformed backpropagated gradients. [00139] As clause 7, the neuromorphic processor of clauses 5 or 6, wherein the transformed gradients generated by the surrogate function comprise a continuous approximation of the backpropagated gradients. [00140] As clause 8, the neuromorphic processor of any one of clauses 5-7, wherein the transformed gradients generated by the surrogate function comprise a continuous relaxation of the backpropagated gradients.

Claims

CLAIMS 1. A method of deploying a pipeline to a neuromorphic processor and subsequently operating the neuromorphic processor, the pipeline and the neuromorphic processor comprising: a convolutional neural network adapted for receiving an input signal and generating corresponding real values; an encoder connected to receive the real values and generate corresponding temporal- coded binary values using an encoding function; and a spiking neural network connected to receive the temporal-coded binary values and generate a corresponding spiking output signal; wherein the pipeline models at least a part of the neuromorphic processor; the method comprising: computing for the pipeline a loss based on the input signal and the spiking output signal; computing for the pipeline backpropagated gradients based on the loss; transforming the backpropagated gradients using a fully differentiable surrogate function that is a surrogate of the encoding function; updating for the pipeline one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network based on the transformed gradients; deploying the one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network of the pipeline to the spiking neural network and convolutional neural network respectively of the neuromorphic processor. 2. The method of claim 1, further comprising updating one or more parameters of the encoder of the pipeline based on the transformed gradients and subsequently deploying the updated one or more parameters of the encoder of the pipeline to the encoder of the neuromorphic processor. 3. The method of claim 1 or 2, wherein the transformed gradients generated by the surrogate function comprise a continuous approximation of the backpropagated gradients. 4. The method of any one of the preceding claims, wherein the transformed gradients generated by the surrogate function comprise a continuous relaxation of the backpropagated gradients. 5. The method of any one of the preceding claims, wherein the loss is used to compute the backpropagated gradients for use in updating the parameters of the spiking neural network, the convolutional neural network and/or the encoder of the pipeline using the chain rule, preferably wherein the backpropagated gradients depend on the loss, a spiking output of the forward pass, previous inputs, and/or in-memory computations and dynamics of neurons of the spiking neural network. 6. The method of any one of the preceding claims, wherein the encoder is a temporal encoder, a rate encoder, a group encoder, a precise spike-time encoder, a thermometer encoder, and/or a straight-through-estimator encoder. 7. The method of any one of the preceding claims, wherein the parameters of each element of the pipeline are initialised before processing of the input begins, preferably wherein the initialisation is performed by drawing samples from certain distributions, e.g. the uniform or gaussian distribution, using genetic algorithms or other initialisation methods. 8. The method of any one of the preceding claims, wherein the parameters for the convolutional neural network comprise one or more of convolutional filters (weights), biases, fully connected layer weights, and/or batch normalization parameters; and/or wherein parameters for the encoder comprise weights and biases of encoder layers, and/or parameters of learnable transformations; and/or wherein parameters for the SNN comprise one or more of synaptic weights, membrane potential threshold, STDP parameters (if using spike-timing learning), and/or neuron and synapse dynamics parameters. 9. The method of any one of the preceding claims, wherein the method further comprises after the step of updating one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network of the neuromorphic processor based on the transformed gradients: the convolutional neural network of the neuromorphic processor receiving a sensor input signal and generating corresponding sensor real values to the sensor input signal; the encoder of the neuromorphic processor receiving the real values corresponding to the sensor input and generating corresponding temporal-coded sensor binary values using the encoding function; and the spiking neural network of the neuromorphic processor receiving the temporal-coded sensor binary values and generating a corresponding sensor spiking output signal; an inference module of the neuromorphic processor performing inference on the sensor input signal based on the sensor spiking output signal. 10. The method of claim 9, wherein the sensor input signal is obtained from an image sensor, an optical sensor, a LiDAR sensor, a RADAR sensor, an inertial measurement sensor, accelerometer sensor, vibration sensor, gas sensor, a proximity sensor, an acoustic sensor, an electroencephalography sensor, an electromyography sensor, and/or an electrocardiography sensor. 11. A neuromorphic processor comprising: a convolutional neural network adapted for receiving an input signal and generating corresponding real values; an encoder connected to receive the real values and generate corresponding temporal- coded binary values using an encoding function; a spiking neural network connected to receive the temporal-coded binary values and generate a corresponding spiking output signal; and wherein one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network are set by, for a pipeline that models at least the convolutional neural network, the encoder and the spiking neural network of the neuromorphic processor: computing for the pipeline a loss based on the input signal and the spiking output signal; computing for the pipeline backpropagated gradients based on the loss; transforming the backpropagated gradients using a fully differentiable surrogate function that is a surrogate of the encoding function; and updating for the pipeline one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network based on the transformed gradients; deploying the one or more parameters of the spiking neural network and one or more parameters of the convolutional neural network of the pipeline to the spiking neural network and convolutional neural network respectively of the neuromorphic processor. 12. The neuromorphic processor of claim 11, wherein one or more parameters of the encoder of the neuromorphic processor are set by updating and subsequently deploying one or more parameters of the encoder of the pipeline based on the transformed backpropagated gradients. 13. The neuromorphic processor of claim 11 or 12, wherein the transformed gradients generated by the surrogate function comprise a continuous approximation of the backpropagated gradients. 14. The neuromorphic processor of any one of claims 11-13, wherein the transformed gradients generated by the surrogate function comprise a continuous relaxation of the backpropagated gradients. 15. The neuromorphic processor of any one of claims 11-14, wherein the loss is used to compute the backpropagated gradients for use in updating the parameters of the spiking neural network, the convolutional neural network and/or the encoder of the pipeline using the chain rule, preferably wherein the backpropagated gradients depend on the loss, a spiking output of the forward pass, previous inputs, and/or in-memory computations and dynamics of neurons of the spiking neural network. 16. The neuromorphic processor of any one of claims 11-15, wherein the encoder is a temporal encoder, a rate encoder, a group encoder, a precise spike-time encoder, a thermometer encoder, and/or a straight-through-estimator encoder. 17. The neuromorphic processor of any one of claims 11-16, wherein the parameters of each element of the pipeline are initialised before processing of the input begins, preferably wherein the initialisation is performed by drawing samples from certain distributions, e.g. the uniform or gaussian distribution, using genetic algorithms or other initialisation methods. 18. The neuromorphic processor of any one of claims 11-17, wherein the parameters for the convolutional neural network comprise one or more of convolutional filters (weights), biases, fully connected layer weights, and/or batch normalization parameters; and/or wherein parameters for the encoder comprise weights and biases of encoder layers, and/or parameters of learnable transformations; and/or wherein parameters for the SNN comprise one or more of synaptic weights, membrane potential threshold, STDP parameters (if using spike-timing learning), and/or neuron and synapse dynamics parameters. 19. The neuromorphic processor of any one of claims 11-18, wherein the convolutional neural network of the neuromorphic processor is configured to receive a sensor input signal and to generate corresponding sensor real values to the sensor input signal; wherein the encoder of the neuromorphic processor is configured to receive the real values corresponding to the sensor input and to generate corresponding temporal-coded sensor binary values using the encoding function; and wherein the spiking neural network of the neuromorphic processor is configured to receive the temporal-coded sensor binary values and to generate a corresponding sensor spiking output signal; and wherein the neuromorphic processor furthermore comprises an inference module, wherein the inference module is configured to perform inference of the sensor input signal based on the sensor spiking output signal. 20. The neuromorphic processor of claim 19, wherein the sensor input signal is obtained from an image sensor, an optical sensor, a LiDAR sensor, a RADAR sensor, an inertial measurement sensor, accelerometer sensor, vibration sensor, gas sensor, a proximity sensor, an acoustic sensor, an electroencephalography sensor, an electromyography sensor, and/or an electrocardiography sensor; preferably wherein the neuromorphic processor is electrically or operatively connected to the sensor that obtained the sensor input signal.