Morere et al., 2020 - Google Patents
Reinforcement learning with probabilistically complete explorationMorere et al., 2020
View PDF- Document ID
- 12248662857176989759
- Author
- Morere P
- Francis G
- Blau T
- Ramos F
- Publication year
- Publication venue
- arXiv preprint arXiv:2001.06940
External Links
Snippet
Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the …
- 230000002787 reinforcement 0 title abstract description 10
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/002—Quantum computers, i.e. information processing by using quantum superposition, coherence, decoherence, entanglement, nonlocality, teleportation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Taylor et al. | Episodic learning with control lyapunov functions for uncertain robotic systems | |
| US11086938B2 (en) | Interpreting human-robot instructions | |
| Okada et al. | Path integral networks: End-to-end differentiable optimal control | |
| Hiraoka et al. | Learning robust options by conditional value at risk optimization | |
| Englert et al. | Combined Optimization and Reinforcement Learning for Manipulation Skills. | |
| Rigatos et al. | Parallelization of a fuzzy control algorithm using quantum computation | |
| JPWO2007135723A1 (en) | Neural network learning apparatus, method, and program | |
| Liaw et al. | Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning | |
| Qadri et al. | InCOpt: Incremental constrained optimization using the Bayes tree | |
| Vinogradska et al. | Numerical quadrature for probabilistic policy search | |
| Morere et al. | Reinforcement learning with probabilistically complete exploration | |
| Li et al. | Bayesian distributional policy gradients | |
| Ortiz-Haro et al. | idb-a*: Iterative search and optimization for optimal kinodynamic motion planning | |
| Ollington et al. | Incorporating expert advice into reinforcement learning using constructive neural networks | |
| Mukadam et al. | Approximately optimal continuous-time motion planning and control via probabilistic inference | |
| Wang et al. | Deep bilinear Koopman realization for dynamics modeling and predictive control | |
| Arora et al. | I2RL: online inverse reinforcement learning under occlusion | |
| Sugiarto et al. | A model-based approach to robot kinematics and control using discrete factor graphs with belief propagation | |
| Lee et al. | A dynamic regret analysis and adaptive regularization algorithm for on-policy robot imitation learning | |
| Cheng | Efficient and principled robot learning: theory and algorithms | |
| Blau et al. | Learning from demonstration without demonstrations | |
| McAllister | Bayesian learning for data-efficient control | |
| Debnath et al. | Solving Markov decision processes with reachability characterization from mean first passage times | |
| Macciò et al. | Local Models for data-driven learning of control policies for complex systems | |
| Fröhlich | Data-efficient controller tuning and reinforcement learning |