research-article

Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios

Authors:

Christelle Yemdji-Tchassi,

Sebastien Aubert,

Pietro MichiardiAuthors Info & Claims

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

Pages 209 - 215

https://doi.org/10.1145/3529399.3529432

Published: 10 June 2022 Publication History

Abstract

When learning to behave in a stochastic environment where safety is critical, such as driving a vehicle in traffic, it is natural for human drivers to plan fallback strategies as a backup to use if ever there is an unexpected change in the environment. Knowing to expect the unexpected, and planning for such outcomes, increases our capability for being robust to unseen scenarios and may help prevent catastrophic failures. Control of Autonomous Vehicles (AVs) has a particular interest in knowing when and how to use fallback strategies in the interest of safety. Due to imperfect information available to an AV about its environment, it is important to have alternate strategies at the ready which might not have been deduced from the original training data distribution.

In this paper we present a principled approach for a model-free Reinforcement Learning (RL) agent to capture multiple modes of behaviour in an environment. We introduce an extra pseudo-reward term to the reward model, to encourage exploration to areas of state-space different from areas privileged by the optimal policy. We base this reward term on a distance metric between the trajectories of agents, in order to force policies to focus on different areas of state-space than the initial exploring agent. Throughout the paper, we refer to this particular training paradigm as learning fallback strategies.

We apply this method to an autonomous driving scenario and show that we are able to learn useful policies that would have otherwise been missed out on during training, and would have been unavailable to the agent when executing the control algorithm.

References

[1]

E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda. 2020. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access, vol. 8, pp. 58 443–58 469.

[2]

S. Depeweg, J.-M. Hernandez-Lobato, F. Doshi-Velez, and S. Udluft. 2018. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1184–1193.

[3]

S. Zang, M. Ding, D. Smith, P. Tyler, T. Rakotoarivelo, and M. A. Kaafar. 2019. The impact of adverse weather conditions on autonomous vehicles: Examining how rain, snow, fog, and hail affect the performance of a self-driving car. IEEE Vehicular Technology Magazine, vol. PP, pp. 1–1, 03.

[4]

M. Bouton, A. Nakhaei, K. Fujimura, and M. J. Kochenderfer. 2019. Safe reinforcement learning with scene decomposition for navigating complex urban environments. In 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 1469–1476.

[5]

W. R. Clements, B.-M. Robaglia, B. V. Delft, R. B. Slaoui, and S. Toth. 2019. Estimating risk and uncertainty in deep reinforcement learning. ArXiv preprint arXiv:190509638.

[6]

C.-J. Hoel, K. Wolff, and L. Laine. Tactical decision-making in autonomous driving by reinforcement learning with uncertainty estimation. In 2020 IEEE Intelligent Vehicles Symposium (IV) pp. 1563–1569.

[7]

A. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, Z. D. Guo, and C. Blundell. 2020. Agent57: Outperforming the Atari human benchmark. In Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 507–517.

[8]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature, vol. 518, p. 529.

[9]

C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, 2019. Dota 2 with large scale deep reinforcement learning. [Online]. Available: http://arxiv.org/abs/1912.06680

[10]

M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. G. Azar, and D. Silver. 2017. Rainbow: combining improvements in deep reinforcement learning. CoRR, vol. abs/1710.02298. [Online]. Available: http://arxiv.org/abs/1710.02298

[11]

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in Proceedings of the International Conference on Learning Representations, 2016.

[12]

R. S. Sutton and A. G. Barto. 2018. Reinforcement Learning: An Introduction, 2nd ed. The MIT Press. [Online]. Available: http://incompleteideas.net/book/the-book-2nd.html

[13]

M. J. Kochenderfer, C. Amato, G. Chowdhary, J. P. How, H. J. D. Reynolds, J. R. Thornton, P. A. Torres-Carrasquillo, N. K. ¨Ure, and J. Vian. 2015. Decision Making Under Uncertainty: Theory and Application, 1st ed. The MIT Press.

[14]

C. Hoel, K. R. Driggs-Campbell, K. Wolff, L. Laine, and M. J. Kochenderfer. 2019. Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. CoRR, vol. abs/1905.02680. [Online]. Available: http://arxiv.org/abs/1905.02680

[15]

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. Cowling, P. Rohlfshagen, S. Tavener, D. Perez Liebana, S. Samothrakis, and S. Colton. 2012. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, vol. 4:1, pp. 1–43, 03.

[16]

R. McAllister and C. E. Rasmussen, “Data-efficient reinforcement learning in continuous state-action gaussian-pomdps. 2017. in Advances in Neural Information Processing Systems, vol. 30.

[17]

M. Hausknecht and P. Stone. 2015. Deep recurrent q-learning for partially observable mdps. in AAAI Fall Symposia.

[18]

P. Zhu X. Li, and P. Poupart. 2017. On improving deep reinforcement learning for pomdps. [Online]. Available: http://arxiv.org/abs/1704.07978

[19]

A. P. Badia, P. Sprechmann, A. Vitvitskyi, D. Guo, B. Piot, S. Kapturowski, O. Tieleman, M. Arjovsky, A. Pritzel, A. Bolt, and C. Blundell. 2020. Never give up: Learning directed exploration strategies,” in International Conference on Learning Representations.

[20]

H. Eriksson and C. Dimitrakakis, “Epistemic risk-sensitive reinforcement learning. 2020. in Proceedings of European Sympiosium on Artificial Neural Networks, Computational Intelligence and Machine Learning.

[21]

Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (pp. 1889-1897). PMLR.

[22]

L. Kirsch, S. van Steenkiste, and J. Schmidhuber. 2019. Improving generalization in meta reinforcement learning using learned objectives in Proceedings of the 10 th International Conference on Learning Representations.

[23]

T. Schaul, D. Horgan, K. Gregor, and D. Silver. 2015. Universal value function approximators. in Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1312–1320.

[24]

M. C. Machado, M. G. Bellemare, and M. Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2295–2304.

[25]

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba. 2017. Hindsight experience replay. In Advances in Neural Information Processing Systems, vol. 30.

[26]

B. Eysenbach, R. R. Salakhutdinov, and S. Levine. 2019. Search on the replay buffer: Bridging planning and reinforcement learning. in Advances in Neural Information Processing Systems, vol. 32.

Index Terms

Automatically Learning Fallback Strategies with Model-Free Reinforcement Learning in Safety-Critical Driving Scenarios
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Utilising Assured Multi-Agent Reinforcement Learning within Safety-Critical Scenarios
Abstract
Multi-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in ...
Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning
Abstract
Autonomous vehicles promise to improve traffic safety while, at the same time, increase fuel efficiency and reduce congestion. They represent the main trend in future intelligent transportation systems. This paper concentrates on the planning ...
Highlights
- A new MDP is proposed to model highway traffic.
- The optimal policies for overtaking/tailgating are obtained using RL.
- Reward functions in the formulation of MaxEnt IRL can take any nonlinear form.
- New MaxEnt deep IRL algorithms ...
Multi-agent reinforcement learning for safe lane changes by connected and autonomous vehicles: A survey
Agents in Traffic and Transportation

Connected Autonomous vehicles (CAVs) are expected to improve the safety and efficiency of traffic by automating driving tasks. Amongst those, lane changing is particularly challenging, as it requires the vehicle to be aware of its highly-dynamic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

March 2022

291 pages

ISBN:9781450395748

DOI:10.1145/3529399

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLT 2022

ICMLT 2022: 2022 7th International Conference on Machine Learning Technologies

March 11 - 13, 2022

Rome, Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
63
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents