Paternain et al., 2018 - Google Patents
Learning policies for markov decision processes in continuous spacesPaternain et al., 2018
- Document ID
- 10833729593636862068
- Author
- Paternain S
- Bazerque J
- Small A
- Ribeiro A
- Publication year
- Publication venue
- 2018 IEEE Conference on Decision and Control (CDC)
External Links
Snippet
In this work we consider the problem of policy optimization in the context of reinforcement learning. In order to avoid discretization, we select the optimal policy to be a continuous function belonging to a reproducing Kernel Hilbert Space (RKHS) which maximizes an …
- 238000000034 method 0 title abstract description 10
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Hansen et al. | Evolution strategies | |
| Paternain et al. | Stochastic policy gradient ascent in reproducing kernel hilbert spaces | |
| CN109886343B (en) | Image classification method and device, equipment and storage medium | |
| Karg et al. | Learning-based approximation of robust nonlinear predictive control with state estimation applied to a towing kite | |
| Omidshafiei et al. | Graph-based cross entropy method for solving multi-robot decentralized POMDPs | |
| Liu et al. | Stochastic loss function | |
| Rozada et al. | Low-rank state-action value-function approximation | |
| Xu et al. | Hierarchical approximate policy iteration with binary-tree state space decomposition | |
| El-Laham et al. | Policy gradient importance sampling for Bayesian inference | |
| CN114863167B (en) | A method, system, equipment and medium for image recognition and classification | |
| Paternain et al. | Learning policies for markov decision processes in continuous spaces | |
| Liu et al. | Online Expectation Maximization for Reinforcement Learning in POMDPs. | |
| Chen et al. | Attention loss adjusted prioritized experience replay | |
| Villanueva et al. | Stochastic optimal control of open quantum systems | |
| Tziortziotis et al. | A model based reinforcement learning approach using on-line clustering | |
| Koppel et al. | Nonparametric stochastic compositional gradient descent for q-learning in continuous markov decision problems | |
| Manss et al. | Consensus based distributed sparse Bayesian learning by fast marginal likelihood maximization | |
| WO2025074369A1 (en) | System and method for efficient collaborative marl training using tensor networks | |
| CN113920365B (en) | Multi-source time sequence classification method, device, equipment and storage medium | |
| Yan et al. | Mpc of uncertain nonlinear systems with meta-learning for fast adaptation of neural predictive models | |
| Zhang et al. | Offline Reinforcement Learning With Reverse Diffusion Guide Policy | |
| Li et al. | Policy gradient methods with gaussian process modelling acceleration | |
| Li et al. | Bayesian optimization with particle swarm | |
| Lancewicki et al. | Sequential covariance-matrix estimation with application to mitigating catastrophic forgetting | |
| Jiang et al. | Robust linear-complexity approach to full SLAM problems: Stochastic variational Bayes inference |