Yang et al., 2023 - Google Patents
Policy representation via diffusion probability model for reinforcement learningYang et al., 2023
View PDF- Document ID
- 9971436291508933587
- Author
- Yang L
- Huang Z
- Lei F
- Zhong Y
- Yang Y
- Fang C
- Wen S
- Zhou B
- Lin Z
- Publication year
- Publication venue
- arXiv preprint arXiv:2305.13122
External Links
Snippet
Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to learn complicated multimodal …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G06N5/025—Extracting rules from data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary programming, e.g. genetic algorithms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/003—Dynamic search techniques, heuristics, branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Yang et al. | Policy representation via diffusion probability model for reinforcement learning | |
| Bengio et al. | A meta-transfer objective for learning to disentangle causal mechanisms | |
| Ueltzhöffer | Deep active inference | |
| Li et al. | Hierarchical diffusion for offline decision making | |
| Yang et al. | Learn to explain efficiently via neural logic inductive learning | |
| Moerland et al. | A0c: Alpha zero in continuous action space | |
| Ritchie et al. | Deep amortized inference for probabilistic programs | |
| Xu et al. | Learning to explore via meta-policy gradient | |
| Xiong et al. | Teilp: Time prediction over knowledge graphs via logical reasoning | |
| Shrivastava et al. | GLAD: Learning sparse graph recovery | |
| Ortega et al. | A minimum relative entropy principle for learning and acting | |
| CN119254483A (en) | Network risk analysis method and system based on multi-level game model | |
| David et al. | DEVS model construction as a reinforcement learning problem | |
| CN118674001A (en) | State action relation reinforcement learning method integrating graph convolution and large language model | |
| WO2024063907A1 (en) | Modelling causation in machine learning | |
| Jiang et al. | Vertical symbolic regression via deep policy gradient | |
| Sarkar et al. | QKSA: Quantum Knowledge Seeking Agent--resource-optimized reinforcement learning using quantum process tomography | |
| EP4591217A1 (en) | Modelling causation in machine learning | |
| Neitz et al. | A teacher-student framework to distill future trajectories | |
| Skryagin et al. | Sum-product logic: integrating probabilistic circuits into deepproblog | |
| Xu et al. | A framework for following temporal logic instructions with unknown causal dependencies | |
| de Souza et al. | Hypergraph neural networks with logic clauses | |
| Zhong et al. | A deep learning assisted gene expression programming framework for symbolic regression problems | |
| Winqvist | Neural Network Approaches for Model Predictive Control | |
| Kielak | Generative Adversarial Imagination for Sample Efficient Deep Reinforcement Learning |