Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3529399.3529432acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmltConference Proceedingsconference-collections
research-article

Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios

Published: 10 June 2022 Publication History

Abstract

When learning to behave in a stochastic environment where safety is critical, such as driving a vehicle in traffic, it is natural for human drivers to plan fallback strategies as a backup to use if ever there is an unexpected change in the environment. Knowing to expect the unexpected, and planning for such outcomes, increases our capability for being robust to unseen scenarios and may help prevent catastrophic failures. Control of Autonomous Vehicles (AVs) has a particular interest in knowing when and how to use fallback strategies in the interest of safety. Due to imperfect information available to an AV about its environment, it is important to have alternate strategies at the ready which might not have been deduced from the original training data distribution.
In this paper we present a principled approach for a model-free Reinforcement Learning (RL) agent to capture multiple modes of behaviour in an environment. We introduce an extra pseudo-reward term to the reward model, to encourage exploration to areas of state-space different from areas privileged by the optimal policy. We base this reward term on a distance metric between the trajectories of agents, in order to force policies to focus on different areas of state-space than the initial exploring agent. Throughout the paper, we refer to this particular training paradigm as learning fallback strategies.
We apply this method to an autonomous driving scenario and show that we are able to learn useful policies that would have otherwise been missed out on during training, and would have been unavailable to the agent when executing the control algorithm.

References

[1]
E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda. 2020. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access, vol. 8, pp. 58 443–58 469.
[2]
S. Depeweg, J.-M. Hernandez-Lobato, F. Doshi-Velez, and S. Udluft. 2018. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1184–1193.
[3]
S. Zang, M. Ding, D. Smith, P. Tyler, T. Rakotoarivelo, and M. A. Kaafar. 2019. The impact of adverse weather conditions on autonomous vehicles: Examining how rain, snow, fog, and hail affect the performance of a self-driving car. IEEE Vehicular Technology Magazine, vol. PP, pp. 1–1, 03.
[4]
M. Bouton, A. Nakhaei, K. Fujimura, and M. J. Kochenderfer. 2019. Safe reinforcement learning with scene decomposition for navigating complex urban environments. In 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 1469–1476.
[5]
W. R. Clements, B.-M. Robaglia, B. V. Delft, R. B. Slaoui, and S. Toth. 2019. Estimating risk and uncertainty in deep reinforcement learning. ArXiv preprint arXiv:190509638.
[6]
C.-J. Hoel, K. Wolff, and L. Laine. Tactical decision-making in autonomous driving by reinforcement learning with uncertainty estimation. In 2020 IEEE Intelligent Vehicles Symposium (IV) pp. 1563–1569.
[7]
A. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, Z. D. Guo, and C. Blundell. 2020. Agent57: Outperforming the Atari human benchmark. In Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 507–517.
[8]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature, vol. 518, p. 529.
[9]
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, 2019. Dota 2 with large scale deep reinforcement learning. [Online]. Available: http://arxiv.org/abs/1912.06680
[10]
M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. G. Azar, and D. Silver. 2017. Rainbow: combining improvements in deep reinforcement learning. CoRR, vol. abs/1710.02298. [Online]. Available: http://arxiv.org/abs/1710.02298
[11]
T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in Proceedings of the International Conference on Learning Representations, 2016.
[12]
R. S. Sutton and A. G. Barto. 2018. Reinforcement Learning: An Introduction, 2nd ed. The MIT Press. [Online]. Available: http://incompleteideas.net/book/the-book-2nd.html
[13]
M. J. Kochenderfer, C. Amato, G. Chowdhary, J. P. How, H. J. D. Reynolds, J. R. Thornton, P. A. Torres-Carrasquillo, N. K. ¨Ure, and J. Vian. 2015. Decision Making Under Uncertainty: Theory and Application, 1st ed. The MIT Press.
[14]
C. Hoel, K. R. Driggs-Campbell, K. Wolff, L. Laine, and M. J. Kochenderfer. 2019. Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. CoRR, vol. abs/1905.02680. [Online]. Available: http://arxiv.org/abs/1905.02680
[15]
C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. Cowling, P. Rohlfshagen, S. Tavener, D. Perez Liebana, S. Samothrakis, and S. Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, vol. 4:1, pp. 1–43, 03.
[16]
R. McAllister and C. E. Rasmussen, “Data-efficient reinforcement learning in continuous state-action gaussian-pomdps. 2017. in Advances in Neural Information Processing Systems, vol. 30.
[17]
M. Hausknecht and P. Stone. 2015. Deep recurrent q-learning for partially observable mdps. in AAAI Fall Symposia.
[18]
P. Zhu X. Li, and P. Poupart. 2017. On improving deep reinforcement learning for pomdps. [Online]. Available: http://arxiv.org/abs/1704.07978
[19]
A. P. Badia, P. Sprechmann, A. Vitvitskyi, D. Guo, B. Piot, S. Kapturowski, O. Tieleman, M. Arjovsky, A. Pritzel, A. Bolt, and C. Blundell. 2020. Never give up: Learning directed exploration strategies,” in International Conference on Learning Representations.
[20]
H. Eriksson and C. Dimitrakakis, “Epistemic risk-sensitive reinforcement learning. 2020. in Proceedings of European Sympiosium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
[21]
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (pp. 1889-1897). PMLR.
[22]
L. Kirsch, S. van Steenkiste, and J. Schmidhuber. 2019. Improving generalization in meta reinforcement learning using learned objectives in Proceedings of the 10 th International Conference on Learning Representations.
[23]
T. Schaul, D. Horgan, K. Gregor, and D. Silver. 2015. Universal value function approximators. in Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1312–1320.
[24]
M. C. Machado, M. G. Bellemare, and M. Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2295–2304.
[25]
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba. 2017. Hindsight experience replay. In Advances in Neural Information Processing Systems, vol. 30.
[26]
B. Eysenbach, R. R. Salakhutdinov, and S. Levine. 2019. Search on the replay buffer: Bridging planning and reinforcement learning. in Advances in Neural Information Processing Systems, vol. 32.

Index Terms

  1. Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies
      March 2022
      291 pages
      ISBN:9781450395748
      DOI:10.1145/3529399
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Autonomous Vehicle
      2. Reinforcement Learning
      3. Safe Control

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICMLT 2022

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 63
        Total Downloads
      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 22 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media