WO2024197457A1 - Closed-loop adaptive brain‑machine interface decoding method and device based on reinforcement learning - Google Patents
Closed-loop adaptive brain‑machine interface decoding method and device based on reinforcement learning Download PDFInfo
- Publication number
- WO2024197457A1 WO2024197457A1 PCT/CN2023/083704 CN2023083704W WO2024197457A1 WO 2024197457 A1 WO2024197457 A1 WO 2024197457A1 CN 2023083704 W CN2023083704 W CN 2023083704W WO 2024197457 A1 WO2024197457 A1 WO 2024197457A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- decoder
- reinforcement learning
- closed
- brain
- neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
Definitions
- the present invention relates to the field of neural signal decoding, and in particular to a closed-loop adaptive brain-computer interface decoding method and device based on reinforcement learning.
- a brain-machine interface decoder is a computational model designed to translate neural activity into human-understandable instructions for controlling external devices, such as prosthetic limbs, wheelchairs, computer games, etc. This technology is an important component of brain-machine interface (BMI).
- BMI brain-machine interface
- the brain-machine interface decoder works by converting neural signals from the brain into instructions for controlling external devices. These neural signals are usually recorded through implanted electrodes or using non-invasive neuroimaging techniques such as electroencephalography. By decoding and analyzing these signals, the computational model can recognize human intentions and behaviors and convert them into instructions that the machine can understand, thereby achieving the purpose of controlling external devices.
- Brain-machine interface decoders have applications in both clinical and laboratory settings.
- brain-machine interface decoders can help people who have lost motor skills regain the ability to control prosthetic limbs or wheelchairs. In the laboratory, it can help neuroscientists better understand how the brain controls behavior and movement, and provide basic research support for the development of new brain-machine interface technologies.
- Closed-loop adaptive decoder (closed-loop decoder adaptation) is a brain-computer interface decoder paradigm that can make real-time adjustments based on feedback signals to improve the accuracy and reliability of the decoder. Compared with traditional open-loop decoders, it can better adapt to different neural signals and actual usage situations, thereby improving the performance and stability of brain-computer interface systems. Closed-loop adaptive decoders usually consist of two main parts: a decoder and a feedback control loop. The decoder is responsible for converting neural signals into control signals, while the feedback control loop provides real-time feedback signals for adjustment. This adjustment can be achieved in a variety of ways, such as adjusting decoder parameters, selecting different neural signal features, or adjusting the time delay of the feedback signal.
- Closed-loop adaptive decoders can help overcome many common problems, such as unstable signals, signal loss, nonlinear neural responses, etc. It can also better adapt to individual differences and Dynamically changing brain activities, thereby improving the accuracy and reliability of the decoder, so as to better control external devices or perform behavioral tasks. Closed-loop adaptive decoders have been widely used in brain-computer interface research and have been used in many application scenarios, such as prosthetic control, brain-controlled games, etc. In the future, with the continuous development of brain-computer interface technology, closed-loop adaptive decoders will continue to play an important role to help humans better control and interact with external devices, as well as provide a deeper understanding of neuroscience research.
- the current decoders are mainly divided into traditional direct decoding and new paradigm decoding using closed-loop adaptive decoders.
- the traditional decoder that directly decodes neural signals is mainly used for offline decoding and decoding offline data.
- the decoding method is mostly used for RNN networks, such as using long short-term memory networks (LSTM) as decoders to extract timing information from neural signals.
- LSTM long short-term memory networks
- Wu, Zhang and others use this paradigm for decoding, in which the decoder module mainly uses methods such as Kalman filtering or support vector regression.
- the embodiment of the present invention provides a closed-loop adaptive brain-computer interface decoding method and device based on reinforcement learning, so as to design a decoder that can both quickly decode and ensure decoding accuracy for application in the brain-computer interface.
- a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning comprising the following steps:
- the new neural signal is input to the decoder, and the decoder generates new action instructions, forming a closed loop that integrates reinforcement learning theory and brain-computer interface decoding paradigm.
- the method further comprises:
- S100 Collecting neural signals from the test animal.
- preprocessing includes amplification, filtering, and denoising.
- the specific action instruction includes moving a cursor to a specified position.
- the decoder After the decoder generates a new action instruction, it assists the test animal in moving the cursor to a fixed position by setting constraints.
- the method includes a training part and a testing part
- the monkey In the training part, the monkey is trained to push the joystick to control the cursor to the specified position, and the monkey is rewarded at this time; the supervised learning neural network is used as a decoder in the training stage, and the target is used as a supervised learning label. Through training, a mapping relationship between neural signals and actions is established, and the trained model is saved;
- the monkey controlled the cursor movement through imagination.
- the monkey generated the intention to control the cursor movement by observing the position of the cursor on the screen.
- the decoder was replaced with a reinforcement learning algorithm to act as a decoder for decoding.
- the reinforcement learning algorithm is DQN, in which the network that maps states to actions is directly loaded into the training phase using model parameters trained by a neural network with supervised learning.
- the monkey used a joystick to control the cursor on the page to move. At this time, a circle of a different color appeared in each trial. After training, the monkey knew that moving the current cursor to the target position would be rewarded;
- the neural signal is collected. After the signal is collected, it needs to be amplified and denoised.
- the processed data is 256 channels, and each channel has a half-precision floating-point data with a sampling rate of 30kHz.
- the collected data is then downsampled to 1kHz using a fixed interval sampling method. After that, the data is converted to the frequency domain and the specific frequency band features are extracted.
- the task sets the frequency band, uses these features as neural network input, and then uses the target position as the label to implement the classification task on the above data using a multi-layer perceptron network, and saves the trained model as a .pth file.
- the monkeys used imagination to control the cursor movement to eventually reach the target area
- the decoder first loads the previously saved PTH model, specifically the DQN network acting as a decoder.
- the DQN network uses the same network structure as the previous supervised deep learning.
- the trained parameter model is loaded to extract and decode the input neural signal features and output the corresponding action instructions.
- the output instruction updates the cursor position on the page.
- the monkey knows the cursor change through visual feedback and generates new neural signals.
- the new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
- a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning comprising:
- a preprocessing unit used for preprocessing the collected neural signals of the test animal
- a new neural signal generating unit is used to input the pre-processed neural signal into the decoder and map it into a specific action instruction, and the test animal body generates a new neural signal by providing visual feedback to the specific action instruction executed;
- the new action command generation unit is used to input new neural signals into the decoder, and the decoder generates new action commands, forming a closed loop that integrates reinforcement learning theory and brain-computer interface decoding paradigm.
- the device also includes:
- the signal acquisition unit is used to collect nerve signals from the test animal.
- a storage medium stores a program file capable of implementing any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning.
- a processor is used to run a program, wherein when the program is running, any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning is executed.
- the closed-loop adaptive brain-computer interface decoding method and device based on reinforcement learning in the embodiment of the present invention The preprocessed neural signals are input into the decoder and mapped into specific action instructions.
- the test animal provides visual feedback on the specific action instructions to generate new neural signals.
- the new neural signals are input into the decoder, and the decoder generates new action instructions, forming a closed loop.
- the reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the laws of biological learning.
- FIG1 is a flow chart of a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning of the present invention
- FIG2 is a preferred flow chart of a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning of the present invention
- FIG3 is a schematic diagram of a closed-loop adaptive decoder in the present invention.
- FIG4 is a diagram of a reinforcement learning model in the present invention.
- FIG5 is a demonstration diagram of a closed-loop adaptive decoder in the present invention.
- FIG6 is a scene diagram of the macaque training stage in the present invention.
- FIG7 is a module diagram of a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning of the present invention
- FIG8 is a preferred module diagram of a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning of the present invention.
- a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning is provided, referring to FIG1 , comprising the following steps:
- the new neural signal is input to the decoder, and the decoder generates new action instructions, forming a closed loop that integrates reinforcement learning theory and brain-computer interface decoding paradigm.
- the closed-loop adaptive brain-computer interface decoding method based on reinforcement learning in the embodiment of the present invention inputs the preprocessed neural signal into the decoder and maps it into specific action instructions.
- the test animal body generates new neural signals by providing visual feedback on the specific action instructions executed.
- the new neural signals are input into the decoder, and the decoder generates new action instructions, forming a closed loop.
- the reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
- the method further includes:
- S100 Collecting neural signals from the test animal.
- preprocessing includes amplification, filtering, and denoising.
- the specific action instruction includes moving the cursor to a specified position. After the decoder generates a new action instruction, it assists the test animal to move the cursor to a fixed position by setting constraints.
- the method includes a training part and a testing part;
- the monkey In the training part, the monkey is trained to push the joystick to control the cursor to the specified position, and the monkey is rewarded at this time; the supervised learning neural network is used as a decoder in the training stage, and the target is used as a supervised learning label. Through training, a mapping relationship between neural signals and actions is established, and the trained model is saved;
- the monkey controlled the cursor movement through imagination.
- the monkey generated the intention to control the cursor movement by observing the position of the cursor on the screen.
- the decoder was replaced with a reinforcement learning algorithm to act as a decoder for decoding.
- the reinforcement learning algorithm is DQN, in which the network that maps states to actions is directly loaded into the training phase using the model parameters trained by the neural network with supervised learning.
- the reinforcement learning algorithm may also adopt: Q-learning, SARSA, deep reinforcement learning algorithm.
- the monkey used a joystick to control the cursor on the page to move. At this time, a circle of a different color appeared in each trial. After training, the monkey knew that moving the current cursor to the target position would be rewarded.
- neural signals are collected. After the signals are collected, they need to be amplified and denoised.
- the processed data is 256 channels, half-precision floating-point data with a sampling rate of 30kHz for each channel.
- the collected data is then downsampled to 1kHz using a fixed interval sampling method.
- the data is then converted to the frequency domain, and specific frequency band features are extracted.
- the frequency band is set according to the specific task. These features are used as neural network inputs, and the target position is used as a label.
- a multi-layer perceptron network is used to implement the classification task for the above data, and the trained model is saved as a .pth file.
- the monkey controlled the cursor movement through imagination and finally reached the target area;
- the decoder first loads the previously saved PTH model, specifically the DQN network acting as a decoder.
- the DQN network uses the same network structure as the previous supervised deep learning.
- the trained parameter model is loaded to extract and decode the input neural signal features and output the corresponding action instructions.
- the output instruction updates the cursor position on the page.
- the monkey knows the cursor change through visual feedback and generates new neural signals.
- the new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
- Neural decoding refers to recording brain neural activity and using computational models to analyze the activity in order to infer the cognitive or behavioral process that the observer is experiencing or performing.
- the most critical part of the neural decoding process is the decoder design. How to design a decoder that can decode quickly and ensure decoding accuracy is very important in brain-computer interface applications.
- the present invention designs a new decoding paradigm based on deep learning and reinforcement learning, and designs a brain-computer interface decoder based on the closed-loop decoding idea, which controls the movement of the cursor on the screen by decoding neural signals.
- the decoder combines the characteristics of fast convergence of supervised deep learning methods and the advantage of reinforcement learning that does not require labels.
- both algorithms use the same neural network for learning.
- the later reinforcement learning algorithm directly uses the network trained by the supervised deep learning algorithm before. In this way, the reinforcement learning algorithm does not need to perform too much random exploration in the early stage, which greatly saves convergence time.
- neural signals are collected from the brain of an animal such as a monkey through neural microelectrodes and sent to the PC through operations such as amplification, filtering, and denoising.
- the PC starts to execute the decoder code, and the decoder maps the neural signal to a specific action instruction, such as moving the cursor up or down.
- the actuator updates the screen in real time after executing the movement instruction, and the updated page is displayed on the monitor.
- the monkey observes the corresponding new neural signal generated after the display page is updated, and the decoding forms a closed loop. Because the monkey's neurons are plastic, they will autonomously adjust in this process to move the cursor to the specified position.
- the decoder can set constraints to help the monkey move the cursor to a fixed position. The monkey and the decoder adapt to each other and learn from each other to complete the same task.
- the present invention mainly includes two parts, namely, a training part and a testing part.
- the monkey pushes the rocker to control the cursor to reach the specified position, and the monkey is rewarded with juice at this time.
- a supervised learning neural network is used as a decoder, because there are labels in the training part, and the target can be used as a supervised learning label in this process.
- the trained model can be saved.
- the monkey controls the movement of the cursor by imagination, and the monkey does not push the rocker. Instead, the monkey generates the intention of controlling the movement of the cursor by watching the cursor position on the screen.
- the decoder is replaced by a reinforcement learning algorithm to act as a decoder for decoding, and the reinforcement learning algorithm used here is a DQN algorithm.
- the network that maps the state to the action is directly loaded into the model parameters trained by the neural network with supervised learning in the training stage, so that trial and error learning is avoided, thereby improving efficiency and accuracy, and making the system more practical.
- Reinforcement learning is an important branch of machine learning that aims to learn how to make optimal decisions through interaction with the environment.
- Reinforcement learning models usually consist of the following five basic elements:
- State Represents the state of the environment, such as the state of a board in a game or the sensor readings of a robot.
- Action represents the operation performed by the agent in a certain state, such as the movement of chess pieces in a game or the movement of a robot.
- Reward Represents the feedback obtained by performing an action in a certain state, usually a scalar value. Rewards can be positive, negative, or zero, reflecting the quality of the action.
- the environment refers to the external environment in which the agent is located, including all possible states and actions, as well as the feedback signals from the environment to the agent.
- the environment can be a real physical environment or a simulated environment, such as a game environment.
- Agent An agent is an algorithm or model that learns decision-making strategies. The agent interacts with the environment, constantly observes the current state, makes decisions and performs actions, and receives reward or punishment signals from the environment to provide feedback on the quality of the decision. The goal of the agent is to learn an optimal decision-making strategy that Maximize the long-term accumulated rewards.
- the reinforcement learning model can be described by a mathematical model. This model can calculate the next action based on the current state and historical actions and rewards, as well as the feedback signals provided by the environment, thereby achieving autonomous decision-making and action.
- Common reinforcement learning algorithms include Q-learning, SARSA, deep reinforcement learning, etc. These algorithms can select different strategies and value functions according to different tasks to achieve optimal decisions.
- Reinforcement learning models have been widely used in many fields, such as robot control, game AI, autonomous driving, etc. Through reinforcement learning, the intelligent agent can learn the optimal strategy from the interaction with the environment and continuously optimize its performance and performance.
- the decoder in FIG5 is regarded as the intelligent agent in FIG4 , the decoder generates an action, and the monkey brain and external devices such as the display are the external environment for the decoder.
- the neural signal transmitted by the monkey brain to the decoder in FIG5 corresponds to the state received by the intelligent agent in FIG4 .
- the decoder receives the neural signal, it generates a corresponding action instruction.
- the host receives the action instruction, it updates the display state.
- the monkey observes the corresponding change and the brain generates a new neural signal through encoding.
- the new neural signal can be regarded as a new state, thus forming a closed loop.
- the reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
- the monkey used the joystick to control the cursor on the page to move.
- a circle of different colors appeared in each trial.
- the monkey knew that moving the current cursor to the target position would result in a juice reward.
- the neural signal was collected.
- the Utah electrode was used for collection.
- the neural signal needed to be amplified, denoised, and processed.
- the processed data was 256 channels, and each channel had a 30kHz sampling rate and half-precision floating-point data.
- the collected data was then downsampled and reduced to a sampling rate of 1kHz using a fixed interval sampling method. After that, the data was converted to the frequency domain, and specific frequency band features were extracted.
- the frequency band was set according to the specific task. These features were used as neural network inputs, and the target position was used as a label.
- the multi-layer perceptron network was used to implement the classification task for the above data, and the trained model was saved as a .pth file.
- the decoder In the test phase, the monkey needs to use imagination to control the cursor movement and finally reach the target area.
- the decoder first loads the previously saved pth model. This part is the (Deep Q Learning, DQN) network acting as a decoder.
- the DQN network uses the same supervised deep learning as before.
- the network structure is then loaded with the trained parameter model to extract features and decode the input neural signals, and output the corresponding action instructions.
- the output instructions update the cursor position on the page.
- the monkey knows the cursor change through visual feedback and generates new neural signals.
- the new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
- the key points and intended protection points of the present invention are:
- the supervised deep learning scheme is combined with the reinforcement learning scheme that does not require labels.
- the supervised deep learning decoding ensures that training can be carried out in a short time (establishing the mapping relationship between neural signals and actions).
- the reinforcement learning scheme ensures that learning and decoding can be carried out well without the need for a label decoder.
- the present invention Compared with the traditional direct decoding method, the present invention has better decoding effect and can achieve online decoding effect. Compared with the general closed-loop adaptive decoder, the present invention can quickly establish the mapping relationship between neural signals and actions in the initial stage, and can also be applied to unlabeled scenarios.
- the present invention has been tested, simulated, and the model has been tested, and the technical effect of the present invention can be basically achieved.
- the present invention can be used not only in implantable brain-computer interfaces, but also in non-implantable brain-computer interfaces based on EEG data.
- a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning is provided, referring to FIG7 , comprising:
- a preprocessing unit 201 is used to preprocess the collected neural signals of the test animal
- a new neural signal generating unit 202 is used to input the pre-processed neural signal into the decoder and map it into a specific action instruction, and the test animal body performs visual feedback on the executed specific action instruction to generate a new neural signal;
- the new action instruction generating unit 203 is used to input new neural signals to the decoder, and the decoder generates new action instructions, forming a closed loop, integrating the reinforcement learning theory and the brain-computer interface decoding paradigm.
- the closed-loop adaptive brain-computer interface decoding device based on reinforcement learning in the embodiment of the present invention inputs the preprocessed neural signal into the decoder to map it into specific action instructions.
- the test animal body provides visual feedback on the executed specific action instructions to generate new neural signals.
- the new neural signals are input to the decoder, and the decoder generates new action instructions to form a closed loop.
- the reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
- the device further includes:
- the signal collection unit 200 is used to collect nerve signals from the test animal.
- Neural decoding refers to recording brain neural activity and using computational models to analyze the activity in order to infer the cognitive or behavioral process that the observer is experiencing or performing.
- the most critical part of the neural decoding process is the decoder design. How to design a decoder that can decode quickly and ensure decoding accuracy is very important in brain-computer interface applications.
- the present invention designs a new decoding paradigm based on deep learning and reinforcement learning, and designs a brain-computer interface decoder based on the closed-loop decoding idea, which controls the movement of the cursor on the screen by decoding neural signals.
- the decoder combines the characteristics of fast convergence of supervised deep learning methods and the advantage of reinforcement learning that does not require labels.
- both algorithms use the same neural network for learning.
- the later reinforcement learning algorithm directly uses the network trained by the supervised deep learning algorithm before. In this way, the reinforcement learning algorithm does not need to perform too much random exploration in the early stage, which greatly saves convergence time.
- neural signals are collected from the brain of an animal such as a monkey through neural microelectrodes and sent to the PC through operations such as amplification, filtering, and denoising.
- the PC starts to execute the decoder code, and the decoder maps the neural signal to a specific action instruction, such as moving the cursor up or down.
- the actuator updates the screen in real time after executing the movement instruction, and the updated page is displayed on the monitor.
- the monkey observes the corresponding new neural signal generated after the display page is updated, and the decoding forms a closed loop. Because the monkey's neurons are plastic, they will autonomously adjust in this process to move the cursor to the specified position.
- the decoder can set constraints to help the monkey move the cursor to a fixed position. The monkey and the decoder adapt to each other and learn from each other to complete the same task.
- the present invention mainly includes two parts, namely, a training part and a testing part.
- the monkey pushes the rocker to control the cursor to reach the specified position, and the monkey is rewarded with juice at this time.
- a supervised learning neural network is used as a decoder, because there are labels in the training part, and the target can be used as a supervised learning label in this process.
- the trained model can be saved.
- the monkey controls the movement of the cursor by imagination, and the monkey does not push the rocker. Instead, the monkey generates the intention of controlling the movement of the cursor by observing the position of the cursor on the screen.
- the decoder is replaced by a reinforcement learning algorithm to act as a decoder for decoding, and the reinforcement learning algorithm used here is a DQN algorithm.
- the network that maps the state to the action is directly loaded into the model parameters trained by the neural network with supervised learning in the training stage, so that trial and error learning is avoided, thereby improving efficiency and accuracy, and making the system more practical.
- Reinforcement learning is an important branch of machine learning that aims to learn how to make optimal decisions through interaction with the environment.
- Reinforcement learning models usually consist of the following five basic elements:
- State Represents the state of the environment, such as the state of a board in a game or the sensor readings of a robot.
- Action represents the operation performed by the agent in a certain state, such as the movement of chess pieces in a game or the movement of a robot.
- Reward Represents the feedback obtained by performing an action in a certain state, usually a scalar value. Rewards can be positive, negative, or zero, reflecting the quality of the action.
- the environment refers to the external environment in which the agent is located, including all possible states and actions, as well as the feedback signals from the environment to the agent.
- the environment can be a real physical environment or a simulated environment, such as a game environment.
- Agent An agent is an algorithm or model that learns decision-making strategies. The agent interacts with the environment, constantly observes the current state, makes decisions and performs actions, and receives reward or punishment signals from the environment to provide feedback on the quality of the decision. The goal of the agent is to learn an optimal decision-making strategy that Maximize the long-term accumulated rewards.
- the reinforcement learning model can be described by a mathematical model. This model can calculate the next action based on the current state and historical actions and rewards, as well as the feedback signals provided by the environment, thereby achieving autonomous decision-making and action.
- Common reinforcement learning algorithms include Q-learning, SARSA, deep reinforcement learning, etc. These algorithms can select different strategies and value functions according to different tasks to achieve optimal decisions.
- Reinforcement learning models have been widely used in many fields, such as robot control, game AI, autonomous driving, etc. Through reinforcement learning, the intelligent agent can learn the optimal strategy from the interaction with the environment and continuously optimize its performance and performance.
- the decoder in FIG5 is regarded as the intelligent agent in FIG4 , the decoder generates an action, and the monkey brain and external devices such as the display are the external environment for the decoder.
- the neural signal transmitted by the monkey brain to the decoder in FIG5 corresponds to the state received by the intelligent agent in FIG4 .
- the decoder receives the neural signal, it generates a corresponding action instruction.
- the host receives the action instruction, it updates the display state.
- the monkey observes the corresponding change and the brain generates a new neural signal through encoding.
- the new neural signal can be regarded as a new state, thus forming a closed loop.
- the reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
- the monkey used the joystick to control the cursor on the page to move.
- a circle of different colors appeared in each trial.
- the monkey knew that moving the current cursor to the target position would result in a juice reward.
- the neural signal was collected.
- the Utah electrode was used for collection.
- the neural signal needed to be amplified, denoised, and processed.
- the processed data was 256 channels, and each channel had a 30kHz sampling rate and half-precision floating-point data.
- the collected data was then downsampled and reduced to a sampling rate of 1kHz using a fixed interval sampling method. After that, the data was converted to the frequency domain, and specific frequency band features were extracted.
- the frequency band was set according to the specific task. These features were used as neural network inputs, and the target position was used as a label.
- the multi-layer perceptron network was used to implement the classification task for the above data, and the trained model was saved as a .pth file.
- the decoder In the test phase, the monkey needs to use imagination to control the cursor movement and finally reach the target area.
- the decoder first loads the previously saved pth model. This part is the (Deep Q Learning, DQN) network acting as a decoder.
- the DQN network uses the same supervised deep learning as before.
- the network structure is then loaded with the trained parameter model to extract features and decode the input neural signals, and output the corresponding action instructions.
- the output instructions update the cursor position on the page.
- the monkey knows the cursor change through visual feedback and generates new neural signals.
- the new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
- the key points and intended protection points of the present invention are:
- the supervised deep learning scheme is combined with the reinforcement learning scheme that does not require labels.
- the supervised deep learning decoding ensures that training can be carried out in a short time (establishing the mapping relationship between neural signals and actions).
- the reinforcement learning scheme ensures that learning and decoding can be carried out well without the need for a label decoder.
- the present invention Compared with the traditional direct decoding method, the present invention has better decoding effect and can achieve online decoding effect. Compared with the general closed-loop adaptive decoder, the present invention can quickly establish the mapping relationship between neural signals and actions in the initial stage, and can also be applied to unlabeled scenarios.
- the present invention has been tested, simulated, and the model has been tested, and the technical effect of the present invention can be basically achieved.
- the present invention can be used not only in implantable brain-computer interfaces, but also in non-implantable brain-computer interfaces based on EEG data.
- a storage medium stores a program file capable of implementing any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning.
- a processor is used to run a program, wherein when the program is running, any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning is executed.
- the disclosed technical content can be implemented in other ways.
- the system embodiments described above are only schematic.
- the division of units can be a logical function division.
- multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed over multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods of each embodiment of the present invention.
- the aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
本发明涉及神经信号解码领域,具体而言,涉及一种基于强化学习的闭环自适应脑机接口解码方法及装置。The present invention relates to the field of neural signal decoding, and in particular to a closed-loop adaptive brain-computer interface decoding method and device based on reinforcement learning.
脑机接口解码器(brain-machine interface decoder)是一种计算模型,旨在将神经活动翻译成人类可以理解的指令,以便控制外部设备,例如假肢、轮椅、计算机游戏等。这种技术是脑机接口(brain-machine interface,BMI)的一个重要组成部分。脑机接口解码器的工作方式是将大脑的神经信号转换为控制外部设备的指令。这些神经信号通常是通过植入电极或者使用非侵入性的神经影像学技术(如脑电图)进行记录。通过对这些信号进行解码和分析,计算模型可以识别出人类的意图和行为,并将其转化为机器可以理解的指令,从而实现控制外部设备的目的。脑机接口解码器在临床和实验室环境中都有应用。临床方面,脑机接口解码器可以帮助失去运动能力的人重新获得控制假肢或轮椅的能力。在实验室方面,它可以帮助神经科学家更好地理解大脑如何控制行为和运动,并为开发新的脑机接口技术提供基础研究支持。A brain-machine interface decoder is a computational model designed to translate neural activity into human-understandable instructions for controlling external devices, such as prosthetic limbs, wheelchairs, computer games, etc. This technology is an important component of brain-machine interface (BMI). The brain-machine interface decoder works by converting neural signals from the brain into instructions for controlling external devices. These neural signals are usually recorded through implanted electrodes or using non-invasive neuroimaging techniques such as electroencephalography. By decoding and analyzing these signals, the computational model can recognize human intentions and behaviors and convert them into instructions that the machine can understand, thereby achieving the purpose of controlling external devices. Brain-machine interface decoders have applications in both clinical and laboratory settings. In the clinic, brain-machine interface decoders can help people who have lost motor skills regain the ability to control prosthetic limbs or wheelchairs. In the laboratory, it can help neuroscientists better understand how the brain controls behavior and movement, and provide basic research support for the development of new brain-machine interface technologies.
闭环自适应解码器(closed-loop decoder adaptation)是一种脑机接口解码器范式,它可以根据反馈信号进行实时调整,从而提高解码器的准确性和可靠性。它与传统的开环解码器相比,可以更好地适应不同的神经信号和实际使用情况,进而提高脑机接口系统的性能和稳定性。闭环自适应解码器通常由两个主要部分组成:解码器和反馈控制环。解码器负责将神经信号转换为控制信号,而反馈控制环则提供实时的反馈信号以便进行调整。这种调整可以通过多种方式实现,例如调整解码器参数、选择不同的神经信号特征或调整反馈信号的时间延迟等。闭环自适应解码器可以帮助克服许多常见的问题,例如不稳定的信号、信号丢失、非线性神经响应等。它还可以更好地适应个体差异和 动态变化的脑活动,从而提高解码器的准确性和可靠性,以便更好地控制外部设备或执行行为任务。闭环自适应解码器在脑机接口研究中得到了广泛应用,并且已经被用于许多应用场景,例如假肢控制、脑控制游戏等。未来,随着脑机接口技术的不断发展,闭环自适应解码器将继续发挥重要作用,以帮助人类更好地控制和交互外部设备,以及为神经科学研究提供更深入的了解。目前的解码器主要分为传统的直接解码和使用闭环自适应解码器新范式解码。其中传统的直接对神经信号进行解码的解码器主要运用于线下解码,对离线数据进行解码,其中解码方法多用如RNN网络进行,例如使用长短时记忆网络(LSTM)作为解码器提取神经信号中的时序信息。由于闭环自适应解码器兴起并取得优秀的效果,因此越来越多研究开始关注这一新兴范式,例如Wu、Zhang等人便是使用该范式进行解码,其中解码器模块主要主要使用卡尔曼滤波或者支持向量回归等方法。Closed-loop adaptive decoder (closed-loop decoder adaptation) is a brain-computer interface decoder paradigm that can make real-time adjustments based on feedback signals to improve the accuracy and reliability of the decoder. Compared with traditional open-loop decoders, it can better adapt to different neural signals and actual usage situations, thereby improving the performance and stability of brain-computer interface systems. Closed-loop adaptive decoders usually consist of two main parts: a decoder and a feedback control loop. The decoder is responsible for converting neural signals into control signals, while the feedback control loop provides real-time feedback signals for adjustment. This adjustment can be achieved in a variety of ways, such as adjusting decoder parameters, selecting different neural signal features, or adjusting the time delay of the feedback signal. Closed-loop adaptive decoders can help overcome many common problems, such as unstable signals, signal loss, nonlinear neural responses, etc. It can also better adapt to individual differences and Dynamically changing brain activities, thereby improving the accuracy and reliability of the decoder, so as to better control external devices or perform behavioral tasks. Closed-loop adaptive decoders have been widely used in brain-computer interface research and have been used in many application scenarios, such as prosthetic control, brain-controlled games, etc. In the future, with the continuous development of brain-computer interface technology, closed-loop adaptive decoders will continue to play an important role to help humans better control and interact with external devices, as well as provide a deeper understanding of neuroscience research. The current decoders are mainly divided into traditional direct decoding and new paradigm decoding using closed-loop adaptive decoders. Among them, the traditional decoder that directly decodes neural signals is mainly used for offline decoding and decoding offline data. The decoding method is mostly used for RNN networks, such as using long short-term memory networks (LSTM) as decoders to extract timing information from neural signals. Due to the rise of closed-loop adaptive decoders and the excellent results achieved, more and more studies have begun to pay attention to this emerging paradigm. For example, Wu, Zhang and others use this paradigm for decoding, in which the decoder module mainly uses methods such as Kalman filtering or support vector regression.
对传统的直接进行解码的解码器而言,其往往用在离线场景,由于只是解码器单方面学习大脑的神经信号因此学习效率更低,解码准确率也相对于闭环自适应解码器范式解码要更差。在闭环自适应解码器范式中,目前有基于卡尔曼滤波设计的闭环自适应解码器和基于传统机器学习如支持向量回归、随机森林等等之类的方法,但这些方法相对于人工神经网络的方法能学习到的内容更浅显,不能很好学习到深层特征,因此从效果来说并没有使用深度学习的方法效果好。Traditional decoders that directly decode are often used in offline scenarios. Since the decoder only unilaterally learns the brain's neural signals, the learning efficiency is lower, and the decoding accuracy is also worse than that of the closed-loop adaptive decoder paradigm. In the closed-loop adaptive decoder paradigm, there are currently closed-loop adaptive decoders based on Kalman filter design and methods based on traditional machine learning such as support vector regression and random forests. However, these methods can learn more superficial content than artificial neural network methods and cannot learn deep features well. Therefore, in terms of effect, they are not as effective as deep learning methods.
发明内容Summary of the invention
本发明实施例提供了一种基于强化学习的闭环自适应脑机接口解码方法及装置,以设计一种既能快速解码又能保证解码准确率的解码器在脑机接口中进行应用。The embodiment of the present invention provides a closed-loop adaptive brain-computer interface decoding method and device based on reinforcement learning, so as to design a decoder that can both quickly decode and ensure decoding accuracy for application in the brain-computer interface.
根据本发明的一实施例,提供了一种基于强化学习的闭环自适应脑机接口解码方法,包括以下步骤:According to one embodiment of the present invention, a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning is provided, comprising the following steps:
S101:对采集的测试动物体神经信号进行预处理; S101: preprocessing the collected neural signals of the test animal;
S102:将预处理后的神经信号输入至解码器内映射为具体动作指令,测试动物体对执行的具体动作指令进行视觉反馈产生新的神经信号;S102: inputting the preprocessed neural signal into the decoder to map it into a specific action instruction, and the test animal performs visual feedback on the executed specific action instruction to generate a new neural signal;
S103:新的神经信号输入至解码器,解码器产生新的动作指令,构成一个闭环,将强化学习理论和脑机接口解码范式进行了融合。S103: The new neural signal is input to the decoder, and the decoder generates new action instructions, forming a closed loop that integrates reinforcement learning theory and brain-computer interface decoding paradigm.
进一步地,方法还包括:Furthermore, the method further comprises:
S100:对测试动物体的神经信号进行采集。S100: Collecting neural signals from the test animal.
进一步地,在步骤S101中,预处理包括放大、滤波、去噪。Furthermore, in step S101, preprocessing includes amplification, filtering, and denoising.
进一步地,具体动作指令包括光标移动到指定位置,解码器产生新的动作指令后通过设置约束将辅助测试动物体将光标移动到固定位置。Furthermore, the specific action instruction includes moving a cursor to a specified position. After the decoder generates a new action instruction, it assists the test animal in moving the cursor to a fixed position by setting constraints.
进一步地,方法包括训练部分和测试部分;Further, the method includes a training part and a testing part;
在训练部分,训练猴子推动摇杆从而控制光标达到指定位置,此时给予猴子奖励;训练阶段使用监督学习的神经网络充当解码器,将目标当作监督学习的标签,通过训练将神经信号和动作建立映射关系,并将该训练好的模型进行保存;In the training part, the monkey is trained to push the joystick to control the cursor to the specified position, and the monkey is rewarded at this time; the supervised learning neural network is used as a decoder in the training stage, and the target is used as a supervised learning label. Through training, a mapping relationship between neural signals and actions is established, and the trained model is saved;
在测试阶段,猴子通过想象来控制光标移动,猴子通过观看屏幕中光标位置产生控制光标移动的意图,此时解码器换成强化学习算法充当解码器进行解码。During the testing phase, the monkey controlled the cursor movement through imagination. The monkey generated the intention to control the cursor movement by observing the position of the cursor on the screen. At this time, the decoder was replaced with a reinforcement learning algorithm to act as a decoder for decoding.
进一步地,强化学习算法为DQN,该算法中将状态映射到动作的网络直接载入训练阶段使用带监督学习的神经网络训练好的模型参数。Furthermore, the reinforcement learning algorithm is DQN, in which the network that maps states to actions is directly loaded into the training phase using model parameters trained by a neural network with supervised learning.
进一步地,在训练阶段,猴子通过摇杆控制页面上的光标进行移动,此时每一个trial会出现一个不同颜色的一个圆,猴子经过训练后知道将当前位置光标移动到目标位置会有奖励;Furthermore, in the training phase, the monkey used a joystick to control the cursor on the page to move. At this time, a circle of a different color appeared in each trial. After training, the monkey knew that moving the current cursor to the target position would be rewarded;
每一个trial开始时对神经信号进行采集,采集到信号后需要对神经信号进行放大、去噪处理,处理完后的数据为256通道,每个通道30kHz采样率的半精度浮点型数据,然后将采集到的数据进行下采样,使用固定间隔抽样的方法降为1kHz采样率,之后将数据转化到频域,提取特定频带特征,根据具体 任务设置频带,将这些特征作为神经网络输入,然后将目标位置作为标签,使用多层感知机网络对上述数据实现分类任务,并将训练好的模型保存为.pth文件。At the beginning of each trial, the neural signal is collected. After the signal is collected, it needs to be amplified and denoised. The processed data is 256 channels, and each channel has a half-precision floating-point data with a sampling rate of 30kHz. The collected data is then downsampled to 1kHz using a fixed interval sampling method. After that, the data is converted to the frequency domain and the specific frequency band features are extracted. The task sets the frequency band, uses these features as neural network input, and then uses the target position as the label to implement the classification task on the above data using a multi-layer perceptron network, and saves the trained model as a .pth file.
进一步地,在测试阶段,猴子通过想象来控制光标移动最终达到目标区域;Furthermore, in the test phase, the monkeys used imagination to control the cursor movement to eventually reach the target area;
在此过程中解码器首先载入之前保存好的pth模型,具体为DQN网络充当解码器,在DQN的网络中使用和之前带监督的深度学习一样的网络结构,然后通过载入训练好的参数模型对输入的神经信号进行特征提取和解码,输出对应的动作指令;In this process, the decoder first loads the previously saved PTH model, specifically the DQN network acting as a decoder. The DQN network uses the same network structure as the previous supervised deep learning. Then, the trained parameter model is loaded to extract and decode the input neural signal features and output the corresponding action instructions.
输出指令更新页面上的光标位置,此时猴子通过视觉反馈知道光标变化从而产生新的神经信号,新的神经信号输入解码器,解码器又产生新的动作,至此构成一个闭环。The output instruction updates the cursor position on the page. At this time, the monkey knows the cursor change through visual feedback and generates new neural signals. The new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
根据本发明的另一实施例,提供了一种基于强化学习的闭环自适应脑机接口解码装置,包括:According to another embodiment of the present invention, a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning is provided, comprising:
预处理单元,用于对采集的测试动物体神经信号进行预处理;A preprocessing unit, used for preprocessing the collected neural signals of the test animal;
新神经信号产生单元,用于将预处理后的神经信号输入至解码器内映射为具体动作指令,测试动物体对执行的具体动作指令进行视觉反馈产生新的神经信号;A new neural signal generating unit is used to input the pre-processed neural signal into the decoder and map it into a specific action instruction, and the test animal body generates a new neural signal by providing visual feedback to the specific action instruction executed;
新动作指令产生单元,用于新的神经信号输入至解码器,解码器产生新的动作指令,构成一个闭环,将强化学习理论和脑机接口解码范式进行了融合。The new action command generation unit is used to input new neural signals into the decoder, and the decoder generates new action commands, forming a closed loop that integrates reinforcement learning theory and brain-computer interface decoding paradigm.
进一步地,该装置还包括:Furthermore, the device also includes:
信号采集单元,用于对测试动物体的神经信号进行采集。The signal acquisition unit is used to collect nerve signals from the test animal.
一种存储介质,存储介质存储有能够实现上述任意一项基于强化学习的闭环自适应脑机接口解码方法的程序文件。A storage medium stores a program file capable of implementing any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning.
一种处理器,处理器用于运行程序,其中,程序运行时执行上述任意一项的基于强化学习的闭环自适应脑机接口解码方法。A processor is used to run a program, wherein when the program is running, any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning is executed.
本发明实施例中的基于强化学习的闭环自适应脑机接口解码方法及装置, 将预处理后的神经信号输入至解码器内映射为具体动作指令,测试动物体对执行的具体动作指令进行视觉反馈产生新的神经信号,新的神经信号输入至解码器,解码器产生新的动作指令,构成一个闭环,将强化学习理论和脑机接口解码范式进行了融合,通过大脑和解码器互相学习,互相适应而不需要标签信号,这也正符合生物学习的规律。The closed-loop adaptive brain-computer interface decoding method and device based on reinforcement learning in the embodiment of the present invention, The preprocessed neural signals are input into the decoder and mapped into specific action instructions. The test animal provides visual feedback on the specific action instructions to generate new neural signals. The new neural signals are input into the decoder, and the decoder generates new action instructions, forming a closed loop. The reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the laws of biological learning.
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are used to provide a further understanding of the present invention and constitute a part of this application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the drawings:
图1为本发明基于强化学习的闭环自适应脑机接口解码方法的流程图;FIG1 is a flow chart of a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning of the present invention;
图2为本发明基于强化学习的闭环自适应脑机接口解码方法的优选流程图;FIG2 is a preferred flow chart of a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning of the present invention;
图3为本发明中闭环自适应解码器示意图;FIG3 is a schematic diagram of a closed-loop adaptive decoder in the present invention;
图4为本发明中强化学习模型图;FIG4 is a diagram of a reinforcement learning model in the present invention;
图5为本发明中闭环自适应解码器演示图;FIG5 is a demonstration diagram of a closed-loop adaptive decoder in the present invention;
图6为本发明中猕猴训练阶段场景图;FIG6 is a scene diagram of the macaque training stage in the present invention;
图7为本发明基于强化学习的闭环自适应脑机接口解码装置的模块图;FIG7 is a module diagram of a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning of the present invention;
图8为本发明基于强化学习的闭环自适应脑机接口解码装置的优选模块图。FIG8 is a preferred module diagram of a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning of the present invention.
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所 描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the present invention. The described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments of the present invention described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.
实施例1Example 1
根据本发明一实施例,提供了一种基于强化学习的闭环自适应脑机接口解码方法,参见图1,包括以下步骤:According to an embodiment of the present invention, a closed-loop adaptive brain-computer interface decoding method based on reinforcement learning is provided, referring to FIG1 , comprising the following steps:
S101:对采集的测试动物体神经信号进行预处理;S101: preprocessing the collected neural signals of the test animal;
S102:将预处理后的神经信号输入至解码器内映射为具体动作指令,测试动物体对执行的具体动作指令进行视觉反馈产生新的神经信号;S102: inputting the preprocessed neural signal into the decoder to map it into a specific action instruction, and the test animal performs visual feedback on the executed specific action instruction to generate a new neural signal;
S103:新的神经信号输入至解码器,解码器产生新的动作指令,构成一个闭环,将强化学习理论和脑机接口解码范式进行了融合。S103: The new neural signal is input to the decoder, and the decoder generates new action instructions, forming a closed loop that integrates reinforcement learning theory and brain-computer interface decoding paradigm.
本发明实施例中的基于强化学习的闭环自适应脑机接口解码方法,将预处理后的神经信号输入至解码器内映射为具体动作指令,测试动物体对执行的具体动作指令进行视觉反馈产生新的神经信号,新的神经信号输入至解码器,解码器产生新的动作指令,构成一个闭环,将强化学习理论和脑机接口解码范式进行了融合,通过大脑和解码器互相学习,互相适应而不需要标签信号,这也正符合生物学习的规律。The closed-loop adaptive brain-computer interface decoding method based on reinforcement learning in the embodiment of the present invention inputs the preprocessed neural signal into the decoder and maps it into specific action instructions. The test animal body generates new neural signals by providing visual feedback on the specific action instructions executed. The new neural signals are input into the decoder, and the decoder generates new action instructions, forming a closed loop. The reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
其中,参见图2,方法还包括: Wherein, referring to FIG2 , the method further includes:
S100:对测试动物体的神经信号进行采集。S100: Collecting neural signals from the test animal.
其中,在步骤S101中,预处理包括放大、滤波、去噪。Wherein, in step S101, preprocessing includes amplification, filtering, and denoising.
其中,具体动作指令包括光标移动到指定位置,解码器产生新的动作指令后通过设置约束将辅助测试动物体将光标移动到固定位置。The specific action instruction includes moving the cursor to a specified position. After the decoder generates a new action instruction, it assists the test animal to move the cursor to a fixed position by setting constraints.
其中,方法包括训练部分和测试部分;Among them, the method includes a training part and a testing part;
在训练部分,训练猴子推动摇杆从而控制光标达到指定位置,此时给予猴子奖励;训练阶段使用监督学习的神经网络充当解码器,将目标当作监督学习的标签,通过训练将神经信号和动作建立映射关系,并将该训练好的模型进行保存;In the training part, the monkey is trained to push the joystick to control the cursor to the specified position, and the monkey is rewarded at this time; the supervised learning neural network is used as a decoder in the training stage, and the target is used as a supervised learning label. Through training, a mapping relationship between neural signals and actions is established, and the trained model is saved;
在测试阶段,猴子通过想象来控制光标移动,猴子通过观看屏幕中光标位置产生控制光标移动的意图,此时解码器换成强化学习算法充当解码器进行解码。During the testing phase, the monkey controlled the cursor movement through imagination. The monkey generated the intention to control the cursor movement by observing the position of the cursor on the screen. At this time, the decoder was replaced with a reinforcement learning algorithm to act as a decoder for decoding.
其中,强化学习算法为DQN,该算法中将状态映射到动作的网络直接载入训练阶段使用带监督学习的神经网络训练好的模型参数。Among them, the reinforcement learning algorithm is DQN, in which the network that maps states to actions is directly loaded into the training phase using the model parameters trained by the neural network with supervised learning.
在其他实施例中,强化学习算法还可以采用:Q-learning、SARSA、深度强化学习算法。In other embodiments, the reinforcement learning algorithm may also adopt: Q-learning, SARSA, deep reinforcement learning algorithm.
其中,在训练阶段,猴子通过摇杆控制页面上的光标进行移动,此时每一个trial会出现一个不同颜色的一个圆,猴子经过训练后知道将当前位置光标移动到目标位置会有奖励;In the training phase, the monkey used a joystick to control the cursor on the page to move. At this time, a circle of a different color appeared in each trial. After training, the monkey knew that moving the current cursor to the target position would be rewarded.
每一个trial开始时对神经信号进行采集,采集到信号后需要对神经信号进行放大、去噪处理,处理完后的数据为256通道,每个通道30kHz采样率的半精度浮点型数据,然后将采集到的数据进行下采样,使用固定间隔抽样的方法降为1kHz采样率,之后将数据转化到频域,提取特定频带特征,根据具体任务设置频带,将这些特征作为神经网络输入,然后将目标位置作为标签,使用多层感知机网络对上述数据实现分类任务,并将训练好的模型保存为.pth文件。 At the beginning of each trial, neural signals are collected. After the signals are collected, they need to be amplified and denoised. The processed data is 256 channels, half-precision floating-point data with a sampling rate of 30kHz for each channel. The collected data is then downsampled to 1kHz using a fixed interval sampling method. The data is then converted to the frequency domain, and specific frequency band features are extracted. The frequency band is set according to the specific task. These features are used as neural network inputs, and the target position is used as a label. A multi-layer perceptron network is used to implement the classification task for the above data, and the trained model is saved as a .pth file.
其中,在测试阶段,猴子通过想象来控制光标移动最终达到目标区域;Among them, in the test phase, the monkey controlled the cursor movement through imagination and finally reached the target area;
在此过程中解码器首先载入之前保存好的pth模型,具体为DQN网络充当解码器,在DQN的网络中使用和之前带监督的深度学习一样的网络结构,然后通过载入训练好的参数模型对输入的神经信号进行特征提取和解码,输出对应的动作指令;In this process, the decoder first loads the previously saved PTH model, specifically the DQN network acting as a decoder. The DQN network uses the same network structure as the previous supervised deep learning. Then, the trained parameter model is loaded to extract and decode the input neural signal features and output the corresponding action instructions.
输出指令更新页面上的光标位置,此时猴子通过视觉反馈知道光标变化从而产生新的神经信号,新的神经信号输入解码器,解码器又产生新的动作,至此构成一个闭环。The output instruction updates the cursor position on the page. At this time, the monkey knows the cursor change through visual feedback and generates new neural signals. The new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
下面以具体实施例,对本发明的基于强化学习的闭环自适应脑机接口解码方法进行详细说明:The following is a detailed description of the closed-loop adaptive brain-computer interface decoding method based on reinforcement learning of the present invention with a specific embodiment:
神经解码(neural decoding)是指通过记录大脑神经活动并使用计算模型来解析该活动,以便推断出被观察者正在经历或执行的认知或行为过程。在神经解码过程中最为关键的便是解码器设计,如何设计一种既能快速解码又能保证解码准确率的解码器在脑机接口应用中十分重要。Neural decoding refers to recording brain neural activity and using computational models to analyze the activity in order to infer the cognitive or behavioral process that the observer is experiencing or performing. The most critical part of the neural decoding process is the decoder design. How to design a decoder that can decode quickly and ensure decoding accuracy is very important in brain-computer interface applications.
本发明基于上述解码器存在的问题设计了一种基于深度学习和强化学习的新解码范式,设计了一种基于闭环解码思想的脑机接口解码器,通过解码神经信号从而控制屏幕中光标移动。该解码器结合了有监督深度学习方法快速收敛的特性和强化学习不需要标签的优点,同时两个算法均采用相同的神经网络进行学习,后期强化学习算法直接使用之前带监督深度学习算法训练好的网络,这样强化学习算法前期不需要过多进行随机探索,大大节约了收敛时间。Based on the problems existing in the above decoders, the present invention designs a new decoding paradigm based on deep learning and reinforcement learning, and designs a brain-computer interface decoder based on the closed-loop decoding idea, which controls the movement of the cursor on the screen by decoding neural signals. The decoder combines the characteristics of fast convergence of supervised deep learning methods and the advantage of reinforcement learning that does not require labels. At the same time, both algorithms use the same neural network for learning. The later reinforcement learning algorithm directly uses the network trained by the supervised deep learning algorithm before. In this way, the reinforcement learning algorithm does not need to perform too much random exploration in the early stage, which greatly saves convergence time.
本发明技术方案的基本内容包括:The basic contents of the technical solution of the present invention include:
如图3所示,从动物例如猴子大脑中通过神经微电极采集到神经信号后通过放大、滤波、去噪等操作发给PC端,PC端开始执行解码器代码,解码器将神经信号映射为一个具体动作指令,例如光标向上向下移动,执行器通过执行该移动指令后画面进行实时更新,更新后的页面显示在显示器上,猴子通过观察到显示器页面更新后产生对应的新神经信号,此时解码形成一个闭环。由于猴子的神经元具有可塑性,因此会在此过程中自主调节从而将光标移动到指定 位置,而解码器可以通过设置约束将辅助猴子将光标移动到固定位置。猴子和解码器互相适应、互相学习从而完成同一个任务。As shown in Figure 3, neural signals are collected from the brain of an animal such as a monkey through neural microelectrodes and sent to the PC through operations such as amplification, filtering, and denoising. The PC starts to execute the decoder code, and the decoder maps the neural signal to a specific action instruction, such as moving the cursor up or down. The actuator updates the screen in real time after executing the movement instruction, and the updated page is displayed on the monitor. The monkey observes the corresponding new neural signal generated after the display page is updated, and the decoding forms a closed loop. Because the monkey's neurons are plastic, they will autonomously adjust in this process to move the cursor to the specified position. The decoder can set constraints to help the monkey move the cursor to a fixed position. The monkey and the decoder adapt to each other and learn from each other to complete the same task.
本发明主要包括两部分内容,分别为训练部分和测试部分。在训练部分猴子推动摇杆从而控制光标达到指定位置,此时给猴子奖励果汁。训练阶段使用监督学习的神经网络充当解码器,因为在训练部分是有标签的,在此过程中可以将目标当作监督学习的标签,通过训练可以将神经信号和动作建立映射关系,并将该训练好的模型进行保存。在测试阶段猴子通过想象来控制光标移动,猴子不会推动摇杆,而是猴子通过观看屏幕中光标位置产生控制光标移动的意图,此时的解码器换成强化学习算法充当解码器进行解码,而该处使用的强化学习算法为DQN算法,该算法中将状态映射到动作的网络直接载入训练阶段使用带监督学习的神经网络训练好的模型参数,这样就避免了进行试错学习,从而提高了效率和准确率,也使得系统更加实用。The present invention mainly includes two parts, namely, a training part and a testing part. In the training part, the monkey pushes the rocker to control the cursor to reach the specified position, and the monkey is rewarded with juice at this time. In the training stage, a supervised learning neural network is used as a decoder, because there are labels in the training part, and the target can be used as a supervised learning label in this process. Through training, a mapping relationship between neural signals and actions can be established, and the trained model can be saved. In the test stage, the monkey controls the movement of the cursor by imagination, and the monkey does not push the rocker. Instead, the monkey generates the intention of controlling the movement of the cursor by watching the cursor position on the screen. At this time, the decoder is replaced by a reinforcement learning algorithm to act as a decoder for decoding, and the reinforcement learning algorithm used here is a DQN algorithm. In the algorithm, the network that maps the state to the action is directly loaded into the model parameters trained by the neural network with supervised learning in the training stage, so that trial and error learning is avoided, thereby improving efficiency and accuracy, and making the system more practical.
首先介绍强化学习,强化学习(reinforcement learning)是机器学习的一个重要分支,旨在通过与环境的交互,学习如何做出最优的决策。强化学习模型通常由以下五个基本元素组成:First, let’s introduce reinforcement learning. Reinforcement learning is an important branch of machine learning that aims to learn how to make optimal decisions through interaction with the environment. Reinforcement learning models usually consist of the following five basic elements:
状态(state):表示环境的状态,例如游戏中的棋盘状态或机器人的传感器读数。State: Represents the state of the environment, such as the state of a board in a game or the sensor readings of a robot.
行动(action):表示智能体(agent)在某一状态下执行的操作,例如游戏中的棋子移动或机器人的运动。Action: represents the operation performed by the agent in a certain state, such as the movement of chess pieces in a game or the movement of a robot.
奖励(reward):表示在某个状态下执行某个行动所获得的反馈,通常是一个标量值。奖励可以是正值、负值或零,反映出行动的优劣。Reward: Represents the feedback obtained by performing an action in a certain state, usually a scalar value. Rewards can be positive, negative, or zero, reflecting the quality of the action.
环境:环境是指智能体所处的外部环境,包括所有可能的状态和行动,以及环境对智能体的反馈信号。环境可以是真实的物理环境,也可以是模拟环境,例如一个游戏环境。Environment: The environment refers to the external environment in which the agent is located, including all possible states and actions, as well as the feedback signals from the environment to the agent. The environment can be a real physical environment or a simulated environment, such as a game environment.
智能体:智能体是指一个学习决策策略的算法或模型。智能体通过与环境的交互,不断地观察当前状态,做出决策并执行行动,同时接收环境的奖励或惩罚信号,以反馈决策的好坏。智能体的目标是学习一个最优的决策策略,使 得长期累积的奖励最大化。Agent: An agent is an algorithm or model that learns decision-making strategies. The agent interacts with the environment, constantly observes the current state, makes decisions and performs actions, and receives reward or punishment signals from the environment to provide feedback on the quality of the decision. The goal of the agent is to learn an optimal decision-making strategy that Maximize the long-term accumulated rewards.
基于这些元素,强化学习模型可以用一个数学模型来描述。这个模型可以根据当前状态和历史行动和奖励,以及环境提供的反馈信号来计算出下一个行动,从而实现自主决策和行动。常见的强化学习算法包括Q-learning、SARSA、深度强化学习等。这些算法可以根据任务的不同,选取不同的策略和价值函数,以实现最优决策。强化学习模型在众多领域得到了广泛应用,例如机器人控制、游戏AI、自动驾驶等。通过强化学习,智能体可以从与环境的交互中学习到最优策略,不断优化其性能和表现。Based on these elements, the reinforcement learning model can be described by a mathematical model. This model can calculate the next action based on the current state and historical actions and rewards, as well as the feedback signals provided by the environment, thereby achieving autonomous decision-making and action. Common reinforcement learning algorithms include Q-learning, SARSA, deep reinforcement learning, etc. These algorithms can select different strategies and value functions according to different tasks to achieve optimal decisions. Reinforcement learning models have been widely used in many fields, such as robot control, game AI, autonomous driving, etc. Through reinforcement learning, the intelligent agent can learn the optimal strategy from the interaction with the environment and continuously optimize its performance and performance.
如图5所示,在本发明中将图5中的解码器看成图4中的智能体,解码器产生动作,而猴子大脑和显示器等外部设备便对于解码器来说是外部环境,图5中猴子大脑传给解码器的神经信号便对应图4中智能体接收到的状态,当解码器接收到神经信号后产生对应动作指令,主机收到动作指令后更新显示器状态,在显示器状态改变后,猴子观测到相应变化后大脑经过编码产生新的神经信号,新的神经信号便可以看作新的状态,这样就构成了一个闭环。这里就将强化学习理论和脑机接口解码范式进行了融合,通过大脑和解码器互相学习,互相适应而不需要标签信号,这也正符合生物学习的规律。As shown in FIG5 , in the present invention, the decoder in FIG5 is regarded as the intelligent agent in FIG4 , the decoder generates an action, and the monkey brain and external devices such as the display are the external environment for the decoder. The neural signal transmitted by the monkey brain to the decoder in FIG5 corresponds to the state received by the intelligent agent in FIG4 . When the decoder receives the neural signal, it generates a corresponding action instruction. After the host receives the action instruction, it updates the display state. After the display state changes, the monkey observes the corresponding change and the brain generates a new neural signal through encoding. The new neural signal can be regarded as a new state, thus forming a closed loop. Here, the reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
如图6所示,在训练阶段,猴子通过摇杆控制页面上的光标进行移动,此时每一个trial会出现一个不同颜色的一个圆,猴子经过训练后知道将当前位置光标移动到目标位置会有果汁奖励。每一个trial开始时对神经信号进行采集,此处采集使用犹他电极进行采集,采集到信号后需要对神经信号进行放大、去噪等处理,处理完后的数据为256通道,每个通道30kHz采样率的半精度浮点型数据,然后将采集到的数据进行下采样,使用固定间隔抽样的方法降为1kHz采样率,之后将数据转化到频域,提取特定频带特征,根据具体任务设置频带,将这些特征作为神经网络输入,然后将目标位置作为标签,使用多层感知机网络对上述数据实现分类任务,并将训练好的模型保存为.pth文件。As shown in Figure 6, during the training phase, the monkey used the joystick to control the cursor on the page to move. At this time, a circle of different colors appeared in each trial. After training, the monkey knew that moving the current cursor to the target position would result in a juice reward. At the beginning of each trial, the neural signal was collected. Here, the Utah electrode was used for collection. After the signal was collected, the neural signal needed to be amplified, denoised, and processed. The processed data was 256 channels, and each channel had a 30kHz sampling rate and half-precision floating-point data. The collected data was then downsampled and reduced to a sampling rate of 1kHz using a fixed interval sampling method. After that, the data was converted to the frequency domain, and specific frequency band features were extracted. The frequency band was set according to the specific task. These features were used as neural network inputs, and the target position was used as a label. The multi-layer perceptron network was used to implement the classification task for the above data, and the trained model was saved as a .pth file.
在测试阶段,猴子需要通过想象来控制光标移动最终达到目标区域。在此过程中解码器首先载入之前保存好的pth模型,这一块是(Deep Q Learning,DQN)网络充当解码器,在DQN的网络中使用和之前带监督的深度学习一样的 网络结构,然后通过载入训练好的参数模型对输入的神经信号进行特征提取和解码,输出对应的动作指令。输出指令更新页面上的光标位置,此时猴子通过视觉反馈知道光标变化从而产生新的神经信号,新的神经信号输入解码器,解码器又产生新的动作,至此构成一个闭环。In the test phase, the monkey needs to use imagination to control the cursor movement and finally reach the target area. In this process, the decoder first loads the previously saved pth model. This part is the (Deep Q Learning, DQN) network acting as a decoder. The DQN network uses the same supervised deep learning as before. The network structure is then loaded with the trained parameter model to extract features and decode the input neural signals, and output the corresponding action instructions. The output instructions update the cursor position on the page. At this time, the monkey knows the cursor change through visual feedback and generates new neural signals. The new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
本发明的关键点和欲保护点为:The key points and intended protection points of the present invention are:
将有监督的深度学习方案和不需要标签的强化学习方案结合,在训练阶段(摇杆控制),有监督的深度学习解码保证了能在短时间内进行训练(建立神经信号和动作之间的映射关系),在测试阶段(通过意图控制),强化学习方案保证了不需要标签解码器便能很好进行学习和解码。The supervised deep learning scheme is combined with the reinforcement learning scheme that does not require labels. In the training phase (joystick control), the supervised deep learning decoding ensures that training can be carried out in a short time (establishing the mapping relationship between neural signals and actions). In the testing phase (through intention control), the reinforcement learning scheme ensures that learning and decoding can be carried out well without the need for a label decoder.
与现有技术相比,本发明的优点为:Compared with the prior art, the advantages of the present invention are:
相对于传统直接解码的方法,本发明具有更好的解码效果,同时可以达到在线解码效果,相较于一般的闭环自适应解码器,本发明在初期能快速建立神经信号到动作之间的映射关系,同时也可应用到无标签的场景。Compared with the traditional direct decoding method, the present invention has better decoding effect and can achieve online decoding effect. Compared with the general closed-loop adaptive decoder, the present invention can quickly establish the mapping relationship between neural signals and actions in the initial stage, and can also be applied to unlabeled scenarios.
本发明目前已经过实验、模拟,并将模型进行过测试,基本能够达到本发明的技术效果。本发明不光可以用到植入式脑机接口中,同样也可以用到基于EEG数据之类的非植入式脑机接口中。The present invention has been tested, simulated, and the model has been tested, and the technical effect of the present invention can be basically achieved. The present invention can be used not only in implantable brain-computer interfaces, but also in non-implantable brain-computer interfaces based on EEG data.
实施例2Example 2
根据本发明的另一实施例,提供了一种基于强化学习的闭环自适应脑机接口解码装置,参见图7,包括:According to another embodiment of the present invention, a closed-loop adaptive brain-computer interface decoding device based on reinforcement learning is provided, referring to FIG7 , comprising:
预处理单元201,用于对采集的测试动物体神经信号进行预处理;A preprocessing unit 201 is used to preprocess the collected neural signals of the test animal;
新神经信号产生单元202,用于将预处理后的神经信号输入至解码器内映射为具体动作指令,测试动物体对执行的具体动作指令进行视觉反馈产生新的神经信号;A new neural signal generating unit 202 is used to input the pre-processed neural signal into the decoder and map it into a specific action instruction, and the test animal body performs visual feedback on the executed specific action instruction to generate a new neural signal;
新动作指令产生单元203,用于新的神经信号输入至解码器,解码器产生新的动作指令,构成一个闭环,将强化学习理论和脑机接口解码范式进行了融合。 The new action instruction generating unit 203 is used to input new neural signals to the decoder, and the decoder generates new action instructions, forming a closed loop, integrating the reinforcement learning theory and the brain-computer interface decoding paradigm.
本发明实施例中的基于强化学习的闭环自适应脑机接口解码装置,将预处理后的神经信号输入至解码器内映射为具体动作指令,测试动物体对执行的具体动作指令进行视觉反馈产生新的神经信号,新的神经信号输入至解码器,解码器产生新的动作指令,构成一个闭环,将强化学习理论和脑机接口解码范式进行了融合,通过大脑和解码器互相学习,互相适应而不需要标签信号,这也正符合生物学习的规律。The closed-loop adaptive brain-computer interface decoding device based on reinforcement learning in the embodiment of the present invention inputs the preprocessed neural signal into the decoder to map it into specific action instructions. The test animal body provides visual feedback on the executed specific action instructions to generate new neural signals. The new neural signals are input to the decoder, and the decoder generates new action instructions to form a closed loop. The reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
其中,参见图8,该装置还包括:Wherein, referring to FIG8 , the device further includes:
信号采集单元200,用于对测试动物体的神经信号进行采集。The signal collection unit 200 is used to collect nerve signals from the test animal.
下面以具体实施例,对本发明的基于强化学习的闭环自适应脑机接口解码装置进行详细说明:The following is a detailed description of the closed-loop adaptive brain-computer interface decoding device based on reinforcement learning of the present invention with a specific embodiment:
神经解码(neural decoding)是指通过记录大脑神经活动并使用计算模型来解析该活动,以便推断出被观察者正在经历或执行的认知或行为过程。在神经解码过程中最为关键的便是解码器设计,如何设计一种既能快速解码又能保证解码准确率的解码器在脑机接口应用中十分重要。Neural decoding refers to recording brain neural activity and using computational models to analyze the activity in order to infer the cognitive or behavioral process that the observer is experiencing or performing. The most critical part of the neural decoding process is the decoder design. How to design a decoder that can decode quickly and ensure decoding accuracy is very important in brain-computer interface applications.
本发明基于上述解码器存在的问题设计了一种基于深度学习和强化学习的新解码范式,设计了一种基于闭环解码思想的脑机接口解码器,通过解码神经信号从而控制屏幕中光标移动。该解码器结合了有监督深度学习方法快速收敛的特性和强化学习不需要标签的优点,同时两个算法均采用相同的神经网络进行学习,后期强化学习算法直接使用之前带监督深度学习算法训练好的网络,这样强化学习算法前期不需要过多进行随机探索,大大节约了收敛时间。Based on the problems existing in the above decoders, the present invention designs a new decoding paradigm based on deep learning and reinforcement learning, and designs a brain-computer interface decoder based on the closed-loop decoding idea, which controls the movement of the cursor on the screen by decoding neural signals. The decoder combines the characteristics of fast convergence of supervised deep learning methods and the advantage of reinforcement learning that does not require labels. At the same time, both algorithms use the same neural network for learning. The later reinforcement learning algorithm directly uses the network trained by the supervised deep learning algorithm before. In this way, the reinforcement learning algorithm does not need to perform too much random exploration in the early stage, which greatly saves convergence time.
本发明技术方案的基本内容包括:The basic contents of the technical solution of the present invention include:
如图3所示,从动物例如猴子大脑中通过神经微电极采集到神经信号后通过放大、滤波、去噪等操作发给PC端,PC端开始执行解码器代码,解码器将神经信号映射为一个具体动作指令,例如光标向上向下移动,执行器通过执行该移动指令后画面进行实时更新,更新后的页面显示在显示器上,猴子通过观察到显示器页面更新后产生对应的新神经信号,此时解码形成一个闭环。由于猴子的神经元具有可塑性,因此会在此过程中自主调节从而将光标移动到指定 位置,而解码器可以通过设置约束将辅助猴子将光标移动到固定位置。猴子和解码器互相适应、互相学习从而完成同一个任务。As shown in Figure 3, neural signals are collected from the brain of an animal such as a monkey through neural microelectrodes and sent to the PC through operations such as amplification, filtering, and denoising. The PC starts to execute the decoder code, and the decoder maps the neural signal to a specific action instruction, such as moving the cursor up or down. The actuator updates the screen in real time after executing the movement instruction, and the updated page is displayed on the monitor. The monkey observes the corresponding new neural signal generated after the display page is updated, and the decoding forms a closed loop. Because the monkey's neurons are plastic, they will autonomously adjust in this process to move the cursor to the specified position. The decoder can set constraints to help the monkey move the cursor to a fixed position. The monkey and the decoder adapt to each other and learn from each other to complete the same task.
本发明主要包括两部分内容,分别为训练部分和测试部分。在训练部分猴子推动摇杆从而控制光标达到指定位置,此时给猴子奖励果汁。训练阶段使用监督学习的神经网络充当解码器,因为在训练部分是有标签的,在此过程中可以将目标当作监督学习的标签,通过训练可以将神经信号和动作建立映射关系,并将该训练好的模型进行保存。在测试阶段猴子通过想象来控制光标移动,猴子不会推动摇杆,而是猴子通过观看屏幕中光标位置产生控制光标移动的意图,此时的解码器换成强化学习算法充当解码器进行解码,而该处使用的强化学习算法为DQN算法,该算法中将状态映射到动作的网络直接载入训练阶段使用带监督学习的神经网络训练好的模型参数,这样就避免了进行试错学习,从而提高了效率和准确率,也使得系统更加实用。The present invention mainly includes two parts, namely, a training part and a testing part. In the training part, the monkey pushes the rocker to control the cursor to reach the specified position, and the monkey is rewarded with juice at this time. In the training stage, a supervised learning neural network is used as a decoder, because there are labels in the training part, and the target can be used as a supervised learning label in this process. Through training, a mapping relationship between neural signals and actions can be established, and the trained model can be saved. In the test stage, the monkey controls the movement of the cursor by imagination, and the monkey does not push the rocker. Instead, the monkey generates the intention of controlling the movement of the cursor by observing the position of the cursor on the screen. At this time, the decoder is replaced by a reinforcement learning algorithm to act as a decoder for decoding, and the reinforcement learning algorithm used here is a DQN algorithm. In the algorithm, the network that maps the state to the action is directly loaded into the model parameters trained by the neural network with supervised learning in the training stage, so that trial and error learning is avoided, thereby improving efficiency and accuracy, and making the system more practical.
首先介绍强化学习,强化学习(reinforcement learning)是机器学习的一个重要分支,旨在通过与环境的交互,学习如何做出最优的决策。强化学习模型通常由以下五个基本元素组成:First, let’s introduce reinforcement learning. Reinforcement learning is an important branch of machine learning that aims to learn how to make optimal decisions through interaction with the environment. Reinforcement learning models usually consist of the following five basic elements:
状态(state):表示环境的状态,例如游戏中的棋盘状态或机器人的传感器读数。State: Represents the state of the environment, such as the state of a board in a game or the sensor readings of a robot.
行动(action):表示智能体(agent)在某一状态下执行的操作,例如游戏中的棋子移动或机器人的运动。Action: represents the operation performed by the agent in a certain state, such as the movement of chess pieces in a game or the movement of a robot.
奖励(reward):表示在某个状态下执行某个行动所获得的反馈,通常是一个标量值。奖励可以是正值、负值或零,反映出行动的优劣。Reward: Represents the feedback obtained by performing an action in a certain state, usually a scalar value. Rewards can be positive, negative, or zero, reflecting the quality of the action.
环境:环境是指智能体所处的外部环境,包括所有可能的状态和行动,以及环境对智能体的反馈信号。环境可以是真实的物理环境,也可以是模拟环境,例如一个游戏环境。Environment: The environment refers to the external environment in which the agent is located, including all possible states and actions, as well as the feedback signals from the environment to the agent. The environment can be a real physical environment or a simulated environment, such as a game environment.
智能体:智能体是指一个学习决策策略的算法或模型。智能体通过与环境的交互,不断地观察当前状态,做出决策并执行行动,同时接收环境的奖励或惩罚信号,以反馈决策的好坏。智能体的目标是学习一个最优的决策策略,使 得长期累积的奖励最大化。Agent: An agent is an algorithm or model that learns decision-making strategies. The agent interacts with the environment, constantly observes the current state, makes decisions and performs actions, and receives reward or punishment signals from the environment to provide feedback on the quality of the decision. The goal of the agent is to learn an optimal decision-making strategy that Maximize the long-term accumulated rewards.
基于这些元素,强化学习模型可以用一个数学模型来描述。这个模型可以根据当前状态和历史行动和奖励,以及环境提供的反馈信号来计算出下一个行动,从而实现自主决策和行动。常见的强化学习算法包括Q-learning、SARSA、深度强化学习等。这些算法可以根据任务的不同,选取不同的策略和价值函数,以实现最优决策。强化学习模型在众多领域得到了广泛应用,例如机器人控制、游戏AI、自动驾驶等。通过强化学习,智能体可以从与环境的交互中学习到最优策略,不断优化其性能和表现。Based on these elements, the reinforcement learning model can be described by a mathematical model. This model can calculate the next action based on the current state and historical actions and rewards, as well as the feedback signals provided by the environment, thereby achieving autonomous decision-making and action. Common reinforcement learning algorithms include Q-learning, SARSA, deep reinforcement learning, etc. These algorithms can select different strategies and value functions according to different tasks to achieve optimal decisions. Reinforcement learning models have been widely used in many fields, such as robot control, game AI, autonomous driving, etc. Through reinforcement learning, the intelligent agent can learn the optimal strategy from the interaction with the environment and continuously optimize its performance and performance.
如图5所示,在本发明中将图5中的解码器看成图4中的智能体,解码器产生动作,而猴子大脑和显示器等外部设备便对于解码器来说是外部环境,图5中猴子大脑传给解码器的神经信号便对应图4中智能体接收到的状态,当解码器接收到神经信号后产生对应动作指令,主机收到动作指令后更新显示器状态,在显示器状态改变后,猴子观测到相应变化后大脑经过编码产生新的神经信号,新的神经信号便可以看作新的状态,这样就构成了一个闭环。这里就将强化学习理论和脑机接口解码范式进行了融合,通过大脑和解码器互相学习,互相适应而不需要标签信号,这也正符合生物学习的规律。As shown in FIG5 , in the present invention, the decoder in FIG5 is regarded as the intelligent agent in FIG4 , the decoder generates an action, and the monkey brain and external devices such as the display are the external environment for the decoder. The neural signal transmitted by the monkey brain to the decoder in FIG5 corresponds to the state received by the intelligent agent in FIG4 . When the decoder receives the neural signal, it generates a corresponding action instruction. After the host receives the action instruction, it updates the display state. After the display state changes, the monkey observes the corresponding change and the brain generates a new neural signal through encoding. The new neural signal can be regarded as a new state, thus forming a closed loop. Here, the reinforcement learning theory and the brain-computer interface decoding paradigm are integrated. The brain and the decoder learn from each other and adapt to each other without the need for label signals, which is in line with the law of biological learning.
如图6所示,在训练阶段,猴子通过摇杆控制页面上的光标进行移动,此时每一个trial会出现一个不同颜色的一个圆,猴子经过训练后知道将当前位置光标移动到目标位置会有果汁奖励。每一个trial开始时对神经信号进行采集,此处采集使用犹他电极进行采集,采集到信号后需要对神经信号进行放大、去噪等处理,处理完后的数据为256通道,每个通道30kHz采样率的半精度浮点型数据,然后将采集到的数据进行下采样,使用固定间隔抽样的方法降为1kHz采样率,之后将数据转化到频域,提取特定频带特征,根据具体任务设置频带,将这些特征作为神经网络输入,然后将目标位置作为标签,使用多层感知机网络对上述数据实现分类任务,并将训练好的模型保存为.pth文件。As shown in Figure 6, during the training phase, the monkey used the joystick to control the cursor on the page to move. At this time, a circle of different colors appeared in each trial. After training, the monkey knew that moving the current cursor to the target position would result in a juice reward. At the beginning of each trial, the neural signal was collected. Here, the Utah electrode was used for collection. After the signal was collected, the neural signal needed to be amplified, denoised, and processed. The processed data was 256 channels, and each channel had a 30kHz sampling rate and half-precision floating-point data. The collected data was then downsampled and reduced to a sampling rate of 1kHz using a fixed interval sampling method. After that, the data was converted to the frequency domain, and specific frequency band features were extracted. The frequency band was set according to the specific task. These features were used as neural network inputs, and the target position was used as a label. The multi-layer perceptron network was used to implement the classification task for the above data, and the trained model was saved as a .pth file.
在测试阶段,猴子需要通过想象来控制光标移动最终达到目标区域。在此过程中解码器首先载入之前保存好的pth模型,这一块是(Deep Q Learning,DQN)网络充当解码器,在DQN的网络中使用和之前带监督的深度学习一样的 网络结构,然后通过载入训练好的参数模型对输入的神经信号进行特征提取和解码,输出对应的动作指令。输出指令更新页面上的光标位置,此时猴子通过视觉反馈知道光标变化从而产生新的神经信号,新的神经信号输入解码器,解码器又产生新的动作,至此构成一个闭环。In the test phase, the monkey needs to use imagination to control the cursor movement and finally reach the target area. In this process, the decoder first loads the previously saved pth model. This part is the (Deep Q Learning, DQN) network acting as a decoder. The DQN network uses the same supervised deep learning as before. The network structure is then loaded with the trained parameter model to extract features and decode the input neural signals, and output the corresponding action instructions. The output instructions update the cursor position on the page. At this time, the monkey knows the cursor change through visual feedback and generates new neural signals. The new neural signals are input into the decoder, and the decoder generates new actions, thus forming a closed loop.
本发明的关键点和欲保护点为:The key points and intended protection points of the present invention are:
将有监督的深度学习方案和不需要标签的强化学习方案结合,在训练阶段(摇杆控制),有监督的深度学习解码保证了能在短时间内进行训练(建立神经信号和动作之间的映射关系),在测试阶段(通过意图控制),强化学习方案保证了不需要标签解码器便能很好进行学习和解码。The supervised deep learning scheme is combined with the reinforcement learning scheme that does not require labels. In the training phase (joystick control), the supervised deep learning decoding ensures that training can be carried out in a short time (establishing the mapping relationship between neural signals and actions). In the testing phase (through intention control), the reinforcement learning scheme ensures that learning and decoding can be carried out well without the need for a label decoder.
与现有技术相比,本发明的优点为:Compared with the prior art, the advantages of the present invention are:
相对于传统直接解码的方法,本发明具有更好的解码效果,同时可以达到在线解码效果,相较于一般的闭环自适应解码器,本发明在初期能快速建立神经信号到动作之间的映射关系,同时也可应用到无标签的场景。Compared with the traditional direct decoding method, the present invention has better decoding effect and can achieve online decoding effect. Compared with the general closed-loop adaptive decoder, the present invention can quickly establish the mapping relationship between neural signals and actions in the initial stage, and can also be applied to unlabeled scenarios.
本发明目前已经过实验、模拟,并将模型进行过测试,基本能够达到本发明的技术效果。本发明不光可以用到植入式脑机接口中,同样也可以用到基于EEG数据之类的非植入式脑机接口中。The present invention has been tested, simulated, and the model has been tested, and the technical effect of the present invention can be basically achieved. The present invention can be used not only in implantable brain-computer interfaces, but also in non-implantable brain-computer interfaces based on EEG data.
实施例3Example 3
一种存储介质,存储介质存储有能够实现上述任意一项基于强化学习的闭环自适应脑机接口解码方法的程序文件。A storage medium stores a program file capable of implementing any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning.
实施例4Example 4
一种处理器,处理器用于运行程序,其中,程序运行时执行上述任意一项的基于强化学习的闭环自适应脑机接口解码方法。A processor is used to run a program, wherein when the program is running, any one of the above-mentioned closed-loop adaptive brain-computer interface decoding methods based on reinforcement learning is executed.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are only for description and do not represent the advantages or disadvantages of the embodiments.
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。 In the above embodiments of the present invention, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的系统实施例仅仅是示意性的,例如单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. Among them, the system embodiments described above are only schematic. For example, the division of units can be a logical function division. There may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed over multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods of each embodiment of the present invention. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。 The above is only a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principle of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/083704 WO2024197457A1 (en) | 2023-03-24 | 2023-03-24 | Closed-loop adaptive brain‑machine interface decoding method and device based on reinforcement learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/083704 WO2024197457A1 (en) | 2023-03-24 | 2023-03-24 | Closed-loop adaptive brain‑machine interface decoding method and device based on reinforcement learning |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024197457A1 true WO2024197457A1 (en) | 2024-10-03 |
Family
ID=92902950
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/083704 Pending WO2024197457A1 (en) | 2023-03-24 | 2023-03-24 | Closed-loop adaptive brain‑machine interface decoding method and device based on reinforcement learning |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024197457A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119884886A (en) * | 2025-03-19 | 2025-04-25 | 中国人民解放军国防科技大学 | Method and system for decoding free association semantics by utilizing stereotactic electroencephalogram signals |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014025772A2 (en) * | 2012-08-06 | 2014-02-13 | University Of Miami | Systems and methods for responsive neurorehabilitation |
| US20190025917A1 (en) * | 2014-12-12 | 2019-01-24 | The Research Foundation For The State University Of New York | Autonomous brain-machine interface |
| US20190159715A1 (en) * | 2016-08-05 | 2019-05-30 | The Regents Of The University Of California | Methods of cognitive fitness detection and training and systems for practicing the same |
| CN110534180A (en) * | 2019-08-20 | 2019-12-03 | 西安电子科技大学 | The man-machine coadaptation Mental imagery brain machine interface system of deep learning and training method |
| CN112884148A (en) * | 2021-03-10 | 2021-06-01 | 中国人民解放军军事科学院国防科技创新研究院 | Hybrid reinforcement learning training method and device embedded with multi-step rules and storage medium |
-
2023
- 2023-03-24 WO PCT/CN2023/083704 patent/WO2024197457A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014025772A2 (en) * | 2012-08-06 | 2014-02-13 | University Of Miami | Systems and methods for responsive neurorehabilitation |
| US20190025917A1 (en) * | 2014-12-12 | 2019-01-24 | The Research Foundation For The State University Of New York | Autonomous brain-machine interface |
| US20190159715A1 (en) * | 2016-08-05 | 2019-05-30 | The Regents Of The University Of California | Methods of cognitive fitness detection and training and systems for practicing the same |
| CN110534180A (en) * | 2019-08-20 | 2019-12-03 | 西安电子科技大学 | The man-machine coadaptation Mental imagery brain machine interface system of deep learning and training method |
| CN112884148A (en) * | 2021-03-10 | 2021-06-01 | 中国人民解放军军事科学院国防科技创新研究院 | Hybrid reinforcement learning training method and device embedded with multi-step rules and storage medium |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119884886A (en) * | 2025-03-19 | 2025-04-25 | 中国人民解放军国防科技大学 | Method and system for decoding free association semantics by utilizing stereotactic electroencephalogram signals |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9050200B2 (en) | System and method for brain machine interface (BMI) control using reinforcement learning | |
| US20210282696A1 (en) | Autonomous brain-machine interface | |
| Khamassi et al. | Robot cognitive control with a neurophysiologically inspired reinforcement learning model | |
| Ortiz-Catalan et al. | BioPatRec: A modular research platform for the control of artificial limbs based on pattern recognition algorithms | |
| Shenoy et al. | Cortical control of arm movements: a dynamical systems perspective | |
| DEL R. MILLÁN et al. | Non-invasive brain-machine interaction | |
| Tsui et al. | A self-paced brain–computer interface for controlling a robot simulator: an online event labelling paradigm and an extended Kalman filter based algorithm for online training | |
| Shahnazian et al. | Distributed representations of action sequences in anterior cingulate cortex: A recurrent neural network approach | |
| Pohlmeyer et al. | Brain-machine interface control of a robot arm using actor-critic rainforcement learning | |
| Sanchez et al. | Exploiting co-adaptation for the design of symbiotic neuroprosthetic assistants | |
| CN111805546A (en) | A method and system for human-multi-robot shared control based on brain-computer interface | |
| WO2024197457A1 (en) | Closed-loop adaptive brain‑machine interface decoding method and device based on reinforcement learning | |
| Prins et al. | A confidence metric for using neurobiological feedback in actor-critic reinforcement learning based brain-machine interfaces | |
| CN116430993A (en) | A closed-loop adaptive brain-computer interface decoding method and device based on reinforcement learning | |
| Karrenbach et al. | Deep learning and session-specific rapid recalibration for dynamic hand gesture recognition from EMG | |
| Talanov et al. | Neuropunk revolution. hacking cognitive systems towards cyborgs 3.0 | |
| Rueckauer et al. | An in-silico framework for modeling optimal control of neural systems | |
| CN115617159A (en) | Reinforcement learning based adaptive state observation for brain-computer interface | |
| Pawlak et al. | Neuromorphic algorithms for brain implants: a review | |
| CN119889574A (en) | Rehabilitation training method and system based on biofeedback | |
| Bae et al. | Reinforcement learning via kernel temporal difference | |
| Gad | Brain-Computer Interface for Supervisory Controls of Unmanned Aerial Vehicles | |
| WO2024197695A1 (en) | Brain computer interface-based intent prediction method and apparatus, device, and storage medium | |
| Sarda et al. | Hand kinematics and decoding hindlimb kinematics using local field potentials using a deep neural network decoding framework | |
| Gao et al. | Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23929061 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |