In addition to the original repository, I further provide addtional trained networks with Collision Avoidance guidance. Please refer to files ending with "CA_30_30_jerk" in the save folder.
The NN architecture is changed as compared to the architecture in the original repository. Two hidden layers of 30 Neurons each with ReLU activation function are used for all Actor-Critic Evaluation & Target Networks to achieve the below performance. Collision avoidance guidance is activated and TTC is kept as 4.001.
The performance is shown below for some test datasets which were not seen during training. In most of the cases as below, its observed that the trained DDPG RL policy fits very closely to the optimal policy solved by MPC, thus validating the optimality of the trained control policy.
However, In the last case, its observed that the trained DDPG RL policy largely differs from the optimal policy solved by MPC even if the Objective function is kept very identical. The possible reasons could be that the control solution from MPC is optimal but only under the assumption that the N-step future Prediction of the states is correctly pursued by the Leading and the Following vehicle which ofcourse is not guaranteed to happen as it really depends on the intention of the leading vehicle. In this regard, the reinforcement learning approach is more promising as it doesnt need the N-step future prediction of the states and infact calculates the optimal policy using only the current state information.
LV--> Leading Vehicle
SV--> Simulated Vehicle/ Follower Vehicle
RL--> Using DDPG Reinforcement Learning
MPC--> Using Model predictive control
This is the modified version of the Original Repository.
Below the contents from the Original Repository
Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving
Source code for paper Zhu, M., Wang, Y., Pu, Z., Hu, J., Wang, X., & Ke, R. (2020). Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transportation Research Part C: Emerging Technologies, 117, 102662. https://www.sciencedirect.com/science/article/pii/S0968090X20305775
Use DDPG for car following velocity control. The key part is the design of reward function. If the reward is not properly designed, the vehicle will either has poor jerk performances or stop there with zero speed (in this case the jerk is zero). So the weights between different objectives are important.
Each element (cell or matrix) in the trainSet.mat and testSet.mat describes a car-following event. For each matrix (event), the columns are spacing, following vehicle speed, relative speed, leading vehilce speed. Events may have different durations.
- Set up python environment by installing the required packages according to requirements.txt
- Directly run Main.ipynb
- simulation_env is the simulaition environment for car following
- MPC_acc is the MPC based ACC implementation. This is a baseline.
@article{zhu2020safe, title={Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving}, author={Zhu, Meixin and Wang, Yinhai and Pu, Ziyuan and Hu, Jingyun and Wang, Xuesong and Ke, Ruimin}, journal={Transportation Research Part C: Emerging Technologies}, volume={117}, pages={102662}, year={2020}, publisher={Elsevier }