Xu et al., 2011 - Google Patents
Hierarchical approximate policy iteration with binary-tree state space decompositionXu et al., 2011
View PDF- Document ID
- 8068936070059303595
- Author
- Xu X
- Liu C
- Yang S
- Hu D
- Publication year
- Publication venue
- IEEE Transactions on Neural Networks
External Links
Snippet
In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), eg, least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to …
- 238000000354 decomposition reaction 0 title abstract description 45
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Hu et al. | On transforming reinforcement learning with transformers: The development trajectory | |
| Doerr et al. | Direct Loss Minimization Inverse Optimal Control. | |
| CN106096729B (en) | A kind of depth-size strategy learning method towards complex task in extensive environment | |
| Arnekvist et al. | Vpe: Variational policy embedding for transfer reinforcement learning | |
| Xu et al. | Hierarchical approximate policy iteration with binary-tree state space decomposition | |
| Levine et al. | Prediction, consistency, curvature: Representation learning for locally-linear control | |
| Agia et al. | Stap: Sequencing task-agnostic policies | |
| Xu et al. | Manifold-based reinforcement learning via locally linear reconstruction | |
| Cetin et al. | Domain-robust visual imitation learning with mutual information constraints | |
| Nozari et al. | Active inference integrated with imitation learning for autonomous driving | |
| Choi et al. | Efficient policy adaptation with contrastive prompt ensemble for embodied agents | |
| Dax et al. | Disentangled neural relational inference for interpretable motion prediction | |
| Zhao et al. | Efficient online estimation of empowerment for reinforcement learning | |
| Hu et al. | Toward multi-task generalization in autonomous navigation: A human-in-the-loop adversarial reinforcement learning with diffusion policy | |
| Yang et al. | An interrelated imitation learning method for heterogeneous drone swarm coordination | |
| Li et al. | Efficient vehicle trajectory prediction with goal lane segments and dual-stream cross attention | |
| Gode et al. | Flownav: Combining flow matching and depth priors for efficient navigation | |
| Guzman et al. | Adaptive model predictive control by learning classifiers | |
| Hussein et al. | Incremental learning for enhanced personalization of autocomplete teleoperation | |
| Jiao et al. | Evadrive: Evolutionary adversarial policy optimization for end-to-end autonomous driving | |
| Galashov et al. | Importance weighted policy learning and adaptation | |
| Qin et al. | Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control | |
| Wang et al. | Evolutionary Multitasking Collaborative Neural Architecture Search for Scene Classification | |
| Gode et al. | FlowNav: Learning Efficient Navigation Policies via Conditional Flow Matching | |
| Haofeng et al. | Learning complicated manipulation skills via deterministic policy with limited demonstrations |