Morere et al., 2020 - Google Patents

Reinforcement learning with probabilistically complete exploration

Morere et al., 2020

Document ID: 12248662857176989759
Author: Morere P; Francis G; Blau T; Ramos F
Publication year: 2020
Publication venue: arXiv preprint arXiv:2001.06940

External Links

Cited by

Snippet

Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the …

Continue reading at arxiv.org (PDF) (other versions)

230000002787 reinforcement 0 title abstract description 10

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/002—Quantum computers, i.e. information processing by using quantum superposition, coherence, decoherence, entanglement, nonlocality, teleportation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines

Similar Documents

Publication	Publication Date	Title
Taylor et al.	2019	Episodic learning with control lyapunov functions for uncertain robotic systems
US11086938B2 (en)	2021-08-10	Interpreting human-robot instructions
Okada et al.	2017	Path integral networks: End-to-end differentiable optimal control
Hiraoka et al.	2019	Learning robust options by conditional value at risk optimization
Englert et al.	2016	Combined Optimization and Reinforcement Learning for Manipulation Skills.
Rigatos et al.	2002	Parallelization of a fuzzy control algorithm using quantum computation
JPWO2007135723A1 (en)	2009-09-24	Neural network learning apparatus, method, and program
Liaw et al.	2017	Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning
Qadri et al.	2022	InCOpt: Incremental constrained optimization using the Bayes tree
Vinogradska et al.	2018	Numerical quadrature for probabilistic policy search
Morere et al.	2020	Reinforcement learning with probabilistically complete exploration
Li et al.	2021	Bayesian distributional policy gradients
Ortiz-Haro et al.	2024	idb-a*: Iterative search and optimization for optimal kinodynamic motion planning
Ollington et al.	2009	Incorporating expert advice into reinforcement learning using constructive neural networks
Mukadam et al.	2017	Approximately optimal continuous-time motion planning and control via probabilistic inference
Wang et al.	2024	Deep bilinear Koopman realization for dynamics modeling and predictive control
Arora et al.	2021	I2RL: online inverse reinforcement learning under occlusion
Sugiarto et al.	2017	A model-based approach to robot kinematics and control using discrete factor graphs with belief propagation
Lee et al.	2018	A dynamic regret analysis and adaptive regularization algorithm for on-policy robot imitation learning
Cheng	2020	Efficient and principled robot learning: theory and algorithms
Blau et al.	2021	Learning from demonstration without demonstrations
McAllister	2017	Bayesian learning for data-efficient control
Debnath et al.	2018	Solving Markov decision processes with reachability characterization from mean first passage times
Macciò et al.	2012	Local Models for data-driven learning of control policies for complex systems
Fröhlich	2022	Data-efficient controller tuning and reinforcement learning