Abstract
Recent years have witnessed rapid development of autonomous driving. Model-based and model-free reinforcement learning are two popular learning methods for autonomous driving. However, these two kinds of methods have their own advantages in achieving excellent driving experience. In order to improve their efficiency and performance, Dyna framework is an promising way to combine their advantages. Unfortunately, the classical Dyna framework can not deal with the continuous actions in reinforcement learning. In addition, the interaction between the world model and the model-free reinforcement learning agent remains at the unidirectional data level. To further improve the effectiveness and efficiency of driving policy learning, we propose a novel Gaussian Process based Dyna-PPO approach in this paper. The Gaussian Process model, which is analytically tractable and fits for small-sample problems, is introduced to build the world model. In addition, we design a mechanism to realize bidirectional interaction between the world model and the policy model. Extensive experiments validate the effectiveness and robustness of our proposed approach. According to our simulation result, the driving distance of the vehicle could be improved by approximately 0.2× times.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hoel C-J, Tram T, Sjöberg J (2020) Reinforcement learning with uncertainty estimation for tactical decision-making in intersections. In: 2020 IEEE 23rd international conference on intelligent transportation systems (ITSC), pp 1–7
Isele D, Rahimi R, Cosgun A, Subramanian K, Fujimura K (2018) Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 2034–2039
Szilárd A (2022) Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans Intell Transp Syst 23(2):740–759
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE symposium series on computational intelligence (SSCI), IEEE, pp 737–744
Ravi Kiran B, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Trans Syst 23(6):4909–4926
Wang J, Zhang Q, Zhao D, Chen Y (2019) Lane change decision-making through deep reinforcement learning with rule-based constraints. In: 2019 international joint conference on neural networks (IJCNN), IEEE, pp 1–6
Kuutti S, Bowden R, Fallah S (2021) Weakly supervised reinforcement learning for autonomous highway driving via virtual safety cages. Sensors 21(6):2032
Liu T, Huang B, Deng Z, Wang H, Tang X, Wang X, Cao D (2020) Heuristics-oriented overtaking decision making for autonomous vehicles using reinforcement learning. IET Electrical Systems Transportation 10(4):417–424
Hoel Carl-Johan, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2019) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5(2):294–305
Lee H, Kim N, Cha SW (2020) Model-based reinforcement learning for eco-driving control of electric vehicles, vol 8
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Machine learning proceedings 1990, Elsevier, pp 216–224
Silver D, Sutton RS, Müller M (2008) Sample-based learning and search with permanent and transient memories. In: Proceedings of the 25th international conference on machine learning, pp 968–975
Peng B, Li X, Gao JL, Wong K-F, Su S-Y (2018) Deep dyna-q: integrating planning for task-completion dialogue policy learning. arXiv:1801.06176
Wang F, Gao J, Li M, Zhao L (2020) Autonomous pev charging scheduling using dyna-q reinforcement learning. IEEE Trans Veh Technol 69(11):12609–12620
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Fang W, Zhang S, Huang H, Dang S, Huang Z, Li W, Wang Z, Sun T, Li H (2020) Learn to make decision with small data for autonomous driving deep Gaussian process and feedback control. J Adv Trans 2020:
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, pp 1582–1591
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2018) Soft actor-critic algorithms and applications. Technical report
Lange S, Riedmiller M, Voigtländer A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
April Y, Palefsky-Smith R, Bedi R (2016) Deep reinforcement learning for simulated autonomous vehicle control. Course Project Reports: Winter 2016
Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the AAAI conference on artificial intelligence, vol 33 pp 4213–4220
Youssef F, Houda B (2020) Comparative study of end-to-end deep learning methods for self-driving car. Int J Intell Syst Appl 12(5):15–27
Han X, Bao H, Liang J, Pan F, Xuan ZX (2018) An adaptive cruise control algorithm based on deep reinforcement learning. Comput Eng 44(7):32–41
Zong X, Guoyan X u, Guizhen Y u, Hongjie S u, Chaowei H u (2018) Obstacle avoidance for self-driving vehicle with reinforcement learning. SAE Int J Passenger Cars-Electron Electr Syst 11(1):28–38
Gao M, Chang DE (2021) Autonomous driving based on modified sac algorithm through imitation learning pretraining. In: 2021 21st international conference on control, automation and systems (ICCAS), pp 1360–1364
Shah S, Dey D, Lovett C, Kapoor A (2018) Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field and service robotics. Springer, pp 621–635
Savari M, Choe Y (2022) Utilizing human feedback in autonomous driving: discrete vs. continuous. Machines 10(8):609
Pei X, Mo S, Chen Z, Bo Y (2020) Research on lane changing of autonomous vehicle based on td3 algorithm in complex road environment. Zhongguo Gonglu Xuebao/China Journal of Highway and Transport :10
Chen I-M, Chan C-Y (2021) Deep reinforcement learning based path tracking controller for autonomous vehicle. Proc IME D J Automob Eng 235(2-3):541–551
Saxena DM, Bae S, Nakhaei A, Fujimura K, Likhachev M (2020) Driving in dense traffic with model-free reinforcement learning. In: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, pp 5385–5392
Coad J, Qiao Z, Dolan JM (2020) Safe trajectory planning using reinforcement learning for self driving. arXiv:2011.04702"
Tang C, Zhuo X u, Tomizuka M (2020) Disturbance-observer-based tracking controller for neural network driving policy transfer. IEEE Trans Intell Transp Syst 21(9):3961–3972
Pan X, Chen X, Cai Q, Canny J, Fisher Y u (2019) Semantic predictive control for explainable and efficient policy learning. In: 2019 international conference on robotics and automation (ICRA), pp 3203–3209
Xu Z, Chen J, Tomizuka M (2020) Guided policy search model-based reinforcement learning for urban autonomous driving
Hewing L, Liniger A, Zeilinger MN (2018) Cautious nmpc with gaussian process dynamics for autonomous miniature race cars. In: 2018 European control conference (ECC), IEEE, pp 1341–1348
Hewing L, Kabzan J, Zeilinger MN (2019) Cautious model predictive control using gaussian process regression. IEEE Trans Control Syst Technol 28(6):2736–2743
Xiao Z, Dai B, Li H, Wu T, Xu X, Zeng Y, Chen T (2017) Gaussian process regression-based robust free space detection for autonomous vehicle by 3-d point cloud and 2-d appearance information fusion. Int J Adv Robot Syst 14(4):1729881417717058
Yuan Y, Zhang Z, Yang XT (2020) Highway traffic state estimation using physics regularized gaussian process: discretized formulation. arXiv:2007.07762
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull 2(4):160–163
Silver D, Sutton RS, Müller M (2008) Sample-based learning and search with permanent and transient memories. In: Proceedings of the 25th international conference on machine learning, ICML ’08. Association for Computing Machinery, New York, pp 968–975
Peng B, Li X, Gao J, Liu J, Kam-Fai W (2018) DEep Dyna-Q: integrating planning for task-completion dialogue policy learning. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, pp 2182–2192
Su S-Y, Li X, Gao J, Liu J, Chen Y-N (2018) Discriminative deep Dyna-Q: robust planning for dialogue policy learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, pp 3813–3823
Hassanien AE, Mononteliza J (2020) Autonomous driving path planning based on sarsa-dyna algorithm. Asia-pacific J Convergent Res Interchange 6(7):59–70
Rasmussen CE (2003) Gaussian processes in machine learning. In: Summer school on machine learning, Springer, pp 63–71
Williams C, Bonilla EV, Chai KM (2007) Multi-task gaussian process prediction. Adv Neural Inf Process Syst :153–160
Duvenaud D (2014) Automatic model construction with Gaussian processes. University of Cambridge, PhD thesis
Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6(Dec):1939–1959
Bui TD, Yan J, Turner RE (2017) A unifying framework for gaussian process pseudo-point approximations using power expectation propagation. J Mach Learn Res 18(1):3649–3720
Solin A, Särkkä S (2020) Hilbert space methods for reduced-rank gaussian process regression. Stat Comput 30(2):419–446
Lázaro-Gredilla M, Quinonero-Candela J, Rasmussen CE, Figueiras-Vidal AR (2010) Sparse spectrum gaussian process regression. J Mach Learn Res 11:1865–1881
Su S-Y, Li X, Gao J, Liu J, Chen Y-N (2018) Discriminative deep dyna-q: robust planning for dialogue policy learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3813–3823
Wu Y, Li X, Liu J, Gao J, Yang Y (2019) Switch-based active deep dyna-q: efficient adaptive planning for task-completion dialogue policy learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33. pp 7289–7296
Jadon S (2020) A survey of loss functions for semantic segmentation. In: 2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), IEEE, pp 1–7
Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
Huk M (2020) Stochastic optimization of contextual neural networks with rmsprop. In: Asian conference on intelligent information and database systems, Springer, pp 343–352
Kingma DP, Adam JB (2014) A method for stochastic optimization. arXiv:1412.6980
Ketkar N (2017) Stochastic gradient descent. In: Deep learning with Python, Springer, pp 113–132
Kingma DP, Welling M (2014) Auto-encoding variational bayes. Stat 1050:1
Gardner JR, Pleiss G, Bindel D, Weinberger KQ, Wilson AG (2018) Gpytorch: blackbox matrix-matrix gaussian process inference with gpu acceleration. In: Advances in neural information processing systems
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16
Sanders A (2016) An introduction to unreal engine 4. CRC Press
Funding
This work was supported in part by the National Natural Science Foundation of China under Grants 62002369, in part by the Scientific Research Project of National University of Defense Technology through grant ZK19-03.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Guanlin Wu, Wenqi Fang and Ji Wang contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, G., Fang, W., Wang, J. et al. Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving. Appl Intell 53, 16893–16907 (2023). https://doi.org/10.1007/s10489-022-04354-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04354-x