Paternain et al., 2018 - Google Patents

Learning policies for markov decision processes in continuous spaces

Paternain et al., 2018

Document ID: 10833729593636862068
Author: Paternain S; Bazerque J; Small A; Ribeiro A
Publication year: 2018
Publication venue: 2018 IEEE Conference on Decision and Control (CDC)

External Links

Cited by

Snippet

In this work we consider the problem of policy optimization in the context of reinforcement learning. In order to avoid discretization, we select the optimal policy to be a continuous function belonging to a reproducing Kernel Hilbert Space (RKHS) which maximizes an …

Continue reading at ieeexplore.ieee.org (other versions)

238000000034 method 0 title abstract description 10

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks

Similar Documents

Publication	Publication Date	Title
Hansen et al.	2015	Evolution strategies
Paternain et al.	2020	Stochastic policy gradient ascent in reproducing kernel hilbert spaces
CN109886343B (en)	2024-01-05	Image classification method and device, equipment and storage medium
Karg et al.	2019	Learning-based approximation of robust nonlinear predictive control with state estimation applied to a towing kite
Omidshafiei et al.	2016	Graph-based cross entropy method for solving multi-robot decentralized POMDPs
Liu et al.	2020	Stochastic loss function
Rozada et al.	2021	Low-rank state-action value-function approximation
Xu et al.	2011	Hierarchical approximate policy iteration with binary-tree state space decomposition
El-Laham et al.	2021	Policy gradient importance sampling for Bayesian inference
CN114863167B (en)	2024-02-02	A method, system, equipment and medium for image recognition and classification
Paternain et al.	2018	Learning policies for markov decision processes in continuous spaces
Liu et al.	2013	Online Expectation Maximization for Reinforcement Learning in POMDPs.
Chen et al.	2023	Attention loss adjusted prioritized experience replay
Villanueva et al.	2024	Stochastic optimal control of open quantum systems
Tziortziotis et al.	2012	A model based reinforcement learning approach using on-line clustering
Koppel et al.	2018	Nonparametric stochastic compositional gradient descent for q-learning in continuous markov decision problems
Manss et al.	2020	Consensus based distributed sparse Bayesian learning by fast marginal likelihood maximization
WO2025074369A1 (en)	2025-04-10	System and method for efficient collaborative marl training using tensor networks
CN113920365B (en)	2025-09-19	Multi-source time sequence classification method, device, equipment and storage medium
Yan et al.	2024	Mpc of uncertain nonlinear systems with meta-learning for fast adaptation of neural predictive models
Zhang et al.	2024	Offline Reinforcement Learning With Reverse Diffusion Guide Policy
Li et al.	2017	Policy gradient methods with gaussian process modelling acceleration
Li et al.	2021	Bayesian optimization with particle swarm
Lancewicki et al.	2015	Sequential covariance-matrix estimation with application to mitigating catastrophic forgetting
Jiang et al.	2019	Robust linear-complexity approach to full SLAM problems: Stochastic variational Bayes inference