[go: up one dir, main page]

Xu et al., 2011 - Google Patents

Hierarchical approximate policy iteration with binary-tree state space decomposition

Xu et al., 2011

View PDF
Document ID
8068936070059303595
Author
Xu X
Liu C
Yang S
Hu D
Publication year
Publication venue
IEEE Transactions on Neural Networks

External Links

Snippet

In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), eg, least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to …
Continue reading at www.derongliu.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6232Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • G06K9/6247Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/04Inference methods or devices

Similar Documents

Publication Publication Date Title
Hu et al. On transforming reinforcement learning with transformers: The development trajectory
Doerr et al. Direct Loss Minimization Inverse Optimal Control.
CN106096729B (en) A kind of depth-size strategy learning method towards complex task in extensive environment
Arnekvist et al. Vpe: Variational policy embedding for transfer reinforcement learning
Xu et al. Hierarchical approximate policy iteration with binary-tree state space decomposition
Levine et al. Prediction, consistency, curvature: Representation learning for locally-linear control
Agia et al. Stap: Sequencing task-agnostic policies
Xu et al. Manifold-based reinforcement learning via locally linear reconstruction
Cetin et al. Domain-robust visual imitation learning with mutual information constraints
Nozari et al. Active inference integrated with imitation learning for autonomous driving
Choi et al. Efficient policy adaptation with contrastive prompt ensemble for embodied agents
Dax et al. Disentangled neural relational inference for interpretable motion prediction
Zhao et al. Efficient online estimation of empowerment for reinforcement learning
Hu et al. Toward multi-task generalization in autonomous navigation: A human-in-the-loop adversarial reinforcement learning with diffusion policy
Yang et al. An interrelated imitation learning method for heterogeneous drone swarm coordination
Li et al. Efficient vehicle trajectory prediction with goal lane segments and dual-stream cross attention
Gode et al. Flownav: Combining flow matching and depth priors for efficient navigation
Guzman et al. Adaptive model predictive control by learning classifiers
Hussein et al. Incremental learning for enhanced personalization of autocomplete teleoperation
Jiao et al. Evadrive: Evolutionary adversarial policy optimization for end-to-end autonomous driving
Galashov et al. Importance weighted policy learning and adaptation
Qin et al. Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control
Wang et al. Evolutionary Multitasking Collaborative Neural Architecture Search for Scene Classification
Gode et al. FlowNav: Learning Efficient Navigation Policies via Conditional Flow Matching
Haofeng et al. Learning complicated manipulation skills via deterministic policy with limited demonstrations