Efficient & Optimal Vehicle Path tracking control using Model based Deep Reinforcement Learning which trains two Actor-Critic NNs that compute Optimal Steering control actions for following any path. To Validate the optimality of the trained parameterized control policy, the Actor NN's solution is compared with that provided by MPC(IPOPT) for the corresponding Optimal Control Problem.
As compared to the original repository, the objective function of both Actor-Critic NNs and MPC are kept exactly same and the NNs are retrained while adapting the hyper parameters. Following results were achieved which clearly shows improvement as compared to the baseline results obtained from the trained network already provided in the original repository. The baseline results shows oscillatory response in both heading angle error and steering action which is not present in the optimal solution provided by MPC, thus questioning the optimality of the previously provided trained NNs. With the retrained NNs the oscillatory behavior is almost eliminated and both the heading angle error & steering control is very close to the optimal solution provided by MPC. This validates that the retrained NNs provides optimal solution to the corresponding OCP.
The retrained network is in directory.
Many Thanks to Haitong Ma for opensourcing the below respository which helped me to understand ADP & Actor-Critic MBRL.
This is a modified version of the original respository.
Below contents from the original repository
-
Code demo for Chpater 8, Reinforcement Learning and Control.
-
Methods: Approximate Dynamic Programming, Model Predictive Control
PyTorch 1.4.0
- To train an agent, follow the example code in
main.py
and tune the parameters. ChangeMETHODS
variable for adjusting the methods to compare in simulation stage. - Simulations will automatically executed after the training is finished. To separately start a simulation from a trained results and compare the performance between ADP and MPC, run
simulation.py
. ChangeLOG_DIR
variable to set the loaded results.
Approximate-Dynamic-Programming
│ main.py - Main script
│ plot.py - To plot comparison between ADP and MPC
│ train.py - To execute PEV and PIM
│ dynamics.py - Vehicle model
│ network.py - Network structure
│ solver.py - Solvers for MPC using CasADi
│ config.py - Configurations about training and vehicle model
│ simulation.py - Run experiment to compare ADP and MPC
│ readme.md
│ requirements.txt
│
├─Results_dir - store trained results
│
└─Simulation_dir - store simulation data and plots
Reinforcement Learning and Control. Tsinghua University Lecture Notes, 2020.
CasADi: a software framework for nonlinear optimization and optimal control