[go: up one dir, main page]

Paternain et al., 2018 - Google Patents

Learning policies for markov decision processes in continuous spaces

Paternain et al., 2018

Document ID
10833729593636862068
Author
Paternain S
Bazerque J
Small A
Ribeiro A
Publication year
Publication venue
2018 IEEE Conference on Decision and Control (CDC)

External Links

Snippet

In this work we consider the problem of policy optimization in the context of reinforcement learning. In order to avoid discretization, we select the optimal policy to be a continuous function belonging to a reproducing Kernel Hilbert Space (RKHS) which maximizes an …
Continue reading at ieeexplore.ieee.org (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6232Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • G06K9/6247Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6296Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6279Classification techniques relating to the number of classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/04Inference methods or devices
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computer systems based on specific mathematical models
    • G06N7/005Probabilistic networks

Similar Documents

Publication Publication Date Title
Hansen et al. Evolution strategies
Paternain et al. Stochastic policy gradient ascent in reproducing kernel hilbert spaces
CN109886343B (en) Image classification method and device, equipment and storage medium
Karg et al. Learning-based approximation of robust nonlinear predictive control with state estimation applied to a towing kite
Omidshafiei et al. Graph-based cross entropy method for solving multi-robot decentralized POMDPs
Liu et al. Stochastic loss function
Rozada et al. Low-rank state-action value-function approximation
Xu et al. Hierarchical approximate policy iteration with binary-tree state space decomposition
El-Laham et al. Policy gradient importance sampling for Bayesian inference
CN114863167B (en) A method, system, equipment and medium for image recognition and classification
Paternain et al. Learning policies for markov decision processes in continuous spaces
Liu et al. Online Expectation Maximization for Reinforcement Learning in POMDPs.
Chen et al. Attention loss adjusted prioritized experience replay
Villanueva et al. Stochastic optimal control of open quantum systems
Tziortziotis et al. A model based reinforcement learning approach using on-line clustering
Koppel et al. Nonparametric stochastic compositional gradient descent for q-learning in continuous markov decision problems
Manss et al. Consensus based distributed sparse Bayesian learning by fast marginal likelihood maximization
WO2025074369A1 (en) System and method for efficient collaborative marl training using tensor networks
CN113920365B (en) Multi-source time sequence classification method, device, equipment and storage medium
Yan et al. Mpc of uncertain nonlinear systems with meta-learning for fast adaptation of neural predictive models
Zhang et al. Offline Reinforcement Learning With Reverse Diffusion Guide Policy
Li et al. Policy gradient methods with gaussian process modelling acceleration
Li et al. Bayesian optimization with particle swarm
Lancewicki et al. Sequential covariance-matrix estimation with application to mitigating catastrophic forgetting
Jiang et al. Robust linear-complexity approach to full SLAM problems: Stochastic variational Bayes inference