[go: up one dir, main page]

Skip to content

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives (Deep RL Workshop 2021)

License

Notifications You must be signed in to change notification settings

omron-sinicx/ShinRL

Repository files navigation

Status: Under development (expect bug fixes and huge updates)

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

ShinRL is an open-source JAX library specialized for the evaluation of reinforcement learning (RL) algorithms from both theoretical and practical perspectives. Please take a look at the paper for details. Try ShinRL at experiments/QuickStart.ipynb.

QuickStart

QuickStart

import gym
from shinrl import DiscreteViSolver
import matplotlib.pyplot as plt

# make an env & a config
env = gym.make("ShinPendulum-v0")
config = DiscreteViSolver.DefaultConfig(explore="eps_greedy", approx="nn", steps_per_epoch=10000)

# make & run a solver
mixins = DiscreteViSolver.make_mixins(env, config)
dqn_solver = DiscreteViSolver.factory(env, config, mixins)
dqn_solver.run()

# plot performance
returns = dqn_solver.scalars["Return"]
plt.plot(returns["x"], returns["y"])

# plot learned q-values  (action == 0)
q0 = dqn_solver.data["Q"][:, 0]
env.plot_S(q0, title="Learned")

Example

⚡ Key Modules

overview

🔬 ShinEnv for Oracle Analysis

  • ShinEnv provides small environments with oracle methods that can compute exact quantities.
  • Some environments support continuous action space and image observation:
  • See the tutorial for details: experiments/Tutorials/ShinEnvTutorial.ipynb.
Environment Discrete action Continuous action Image Observation Tuple Observation
ShinMaze ✔️ ✔️
ShinMountainCar-v0 ✔️ ✔️ ✔️ ✔️
ShinPendulum-v0 ✔️ ✔️ ✔️ ✔️
ShinCartPole-v0 ✔️ ✔️ ✔️

🏭 Flexible Solver by MixIn

  • A Solver solves an environment with specified algorithms.
  • A "mixin" is a class which defines and implements a single feature. ShinRL's solvers are instantiated by mixing some mixins.
  • See the tutorial for details: experiments/Tutorials/SolverTutorial.ipynb.

MixIn

Implemented Popular Algorithms

  • The table bellow lists the implemented popular algorithms.
  • Note that it does not list all the implemented algorithms (e.g., DDP 1 version of the DQN algorithm). See make_mixin functions of solvers for implemented variants.
  • Note that the implemented algorithms may differ from the original implementation for simplicity (e.g., Discrete SAC). See source code of solvers for details.
Algorithm Solver Configuration Type 1
Value Iteration (VI) DiscreteViSolver approx == "tabular" & explore == "oracle" TDP
Policy Iteration (PI) DiscretePiSolver approx == "tabular" & explore == "oracle" TDP
Conservative Value Iteration (CVI) DiscreteViSolver approx == "tabular" & explore == "oracle & er_coef != 0 & kl_coef != 0" TDP
Tabular Q Learning DiscreteViSolver approx == "tabular" & explore != "oracle" TRL
SARSA DiscretePiSolver approx == "tabular" & explore != "oracle" & eps_decay_target_pol > 0 TRL
Deep Q Network (DQN) DiscreteViSolver approx == "nn" & explore != "oracle" DRL
Soft DQN DiscreteViSolver approx == "nn" & explore != "oracle" & er_coef != 0 DRL
Munchausen-DQN DiscreteViSolver approx == "nn" & explore != "oracle" & er_coef != 0 & kl_coef != 0 DRL
Double-DQN DiscreteViSolver approx == "nn" & explore != "oracle" & use_double_q == True DRL
Discrete Soft Actor Critic DiscretePiSolver approx == "nn" & explore != "oracle" & er_coef != 0 DRL
Deep Deterministic Policy Gradient (DDPG) ContinuousDdpgSolver approx == "nn" & explore != "oracle" DRL

1 Algorithm Type:

  • TDP (approx=="tabular" & explore=="oracle"): Tabular Dynamic Programming algorithms. No exploration & no approximation & the complete specification about the MDP is given.
  • TRL (approx=="tabular" & explore!="oracle"): Tabular Reinforcement Learning algorithms. No approximation & the dynamics and the reward functions are unknown.
  • DDP (approx=="nn" & explore=="oracle"): Deep Dynamic Programming algorithms. It is the same as TDP, except that neural networks approximate computed values.
  • DRL (approx=="nn" & explore!="oracle"): Deep Reinforcement Learning algorithms. It is the same as TRL, except that neural networks approximate computed values.

Installation

git clone git@github.com:omron-sinicx/ShinRL.git
cd ShinRL
pip install -e .

Test

cd ShinRL
make test

Format

cd ShinRL
make format

Docker

cd ShinRL
docker-compose up

Citation

# Neurips DRL WS 2021 version (pytorch branch)
@inproceedings{toshinori2021shinrl,
    author = {Kitamura, Toshinori and Yonetani, Ryo},
    title = {ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives},
    year = {2021},
    booktitle = {Proceedings of the NeurIPS Deep RL Workshop},
}

# Arxiv version (commit 2d3da)
@article{toshinori2021shinrlArxiv,
    author = {Kitamura, Toshinori and Yonetani, Ryo},
    title = {ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives},
    year = {2021},
    url = {https://arxiv.org/abs/2112.04123},
    journal={arXiv preprint arXiv:2112.04123},
}

About

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives (Deep RL Workshop 2021)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published