Xu et al., 2011 - Google Patents

Hierarchical approximate policy iteration with binary-tree state space decomposition

Xu et al., 2011

Document ID: 8068936070059303595
Author: Xu X; Liu C; Yang S; Hu D
Publication year: 2011
Publication venue: IEEE Transactions on Neural Networks

External Links

Cited by

Snippet

In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), eg, least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to …

Continue reading at www.derongliu.org (PDF) (other versions)

238000000354 decomposition reaction 0 title abstract description 45

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices

Similar Documents

Publication	Publication Date	Title
Hu et al.	2024	On transforming reinforcement learning with transformers: The development trajectory
Doerr et al.	2015	Direct Loss Minimization Inverse Optimal Control.
CN106096729B (en)	2018-11-20	A kind of depth-size strategy learning method towards complex task in extensive environment
Arnekvist et al.	2019	Vpe: Variational policy embedding for transfer reinforcement learning
Xu et al.	2011	Hierarchical approximate policy iteration with binary-tree state space decomposition
Levine et al.	2019	Prediction, consistency, curvature: Representation learning for locally-linear control
Agia et al.	2022	Stap: Sequencing task-agnostic policies
Xu et al.	2016	Manifold-based reinforcement learning via locally linear reconstruction
Cetin et al.	2021	Domain-robust visual imitation learning with mutual information constraints
Nozari et al.	2022	Active inference integrated with imitation learning for autonomous driving
Choi et al.	2024	Efficient policy adaptation with contrastive prompt ensemble for embodied agents
Dax et al.	2023	Disentangled neural relational inference for interpretable motion prediction
Zhao et al.	2020	Efficient online estimation of empowerment for reinforcement learning
Hu et al.	2025	Toward multi-task generalization in autonomous navigation: A human-in-the-loop adversarial reinforcement learning with diffusion policy
Yang et al.	2022	An interrelated imitation learning method for heterogeneous drone swarm coordination
Li et al.	2024	Efficient vehicle trajectory prediction with goal lane segments and dual-stream cross attention
Gode et al.	2024	Flownav: Combining flow matching and depth priors for efficient navigation
Guzman et al.	2022	Adaptive model predictive control by learning classifiers
Hussein et al.	2022	Incremental learning for enhanced personalization of autocomplete teleoperation
Jiao et al.	2025	Evadrive: Evolutionary adversarial policy optimization for end-to-end autonomous driving
Galashov et al.	2020	Importance weighted policy learning and adaptation
Qin et al.	2025	Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control
Wang et al.	2024	Evolutionary Multitasking Collaborative Neural Architecture Search for Scene Classification
Gode et al.	2024	FlowNav: Learning Efficient Navigation Policies via Conditional Flow Matching
Haofeng et al.	2023	Learning complicated manipulation skills via deterministic policy with limited demonstrations