research-article

Open access

Verifiably safe exploration for end-to-end reinforcement learning

Authors:

Sara Magliacane,

Trong Nghia Hoang,

Armando Solar-LezamaAuthors Info & Claims

HSCC '21: Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control

Article No.: 14, Pages 1 - 11

https://doi.org/10.1145/3447928.3456653

Published: 19 May 2021 Publication History

Abstract

Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We characterize safety constraints in terms of a refinement relation on Markov decision processes - rather than directly constraining the reinforcement learning algorithm so that it only takes safe actions, we instead refine the environment so that only safe actions are defined in the environment's transition structure. This has pragmatic system design benefits and, more importantly, provides a clean conceptual setting in which we are able to prove important safety and efficiency properties. These allow us to transform the constrained optimization problem of acting safely in the original environment into an unconstrained optimization in a refined environment.

References

[1]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In International Conference on Machine Learning (ICML 2017) (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 22--31.

[2]

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe Reinforcement Learning via Shielding. In AAAI Conference on Artificial Intelligence.

[3]

Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, and Andreas Krause. 2017. Safe model-based reinforcement learning with stability guarantees. In Advances in neural information processing systems. 908--918.

[4]

Richard Cheng, Gábor Orosz, Richard M Murray, and Joel W Burdick. 2019. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3387--3395.

Digital Library

[5]

Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). 2018. Handbook of Model Checking. Springer.

[6]

Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, and Yuval Tassa. 2018. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757 (2018).

[7]

Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. 2019. Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications. In International Conference on Automated Planning and Scheduling (ICAPS 2019).

[8]

Nathan Fulton, Stefan Mitsch, Brandon Bohrer, and André Platzer. 2017. Bellerophon: Tactical Theorem Proving for Hybrid Systems. In International Conference on Interactive Theorem Proving.

[9]

Nathan Fulton, Stefan Mitsch, Jan-David Quesel, Marcus Völp, and André Platzer. 2015. KeYmaera X: An Axiomatic Tactical Theorem Prover for Hybrid Systems. In CADE.

[10]

Nathan Fulton and André Platzer. 2018. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning. In AAAI Conference on Artificial Intelligence.

[11]

Nathan Fulton and André Platzer. 2019. Verifiably Safe Off-Model Reinforcement Learning. In TACAS 2019 (Lecture Notes in Computer Science, Vol. 11427), Tomás Vojnar and Lijun Zhang (Eds.). Springer, 413--430.

Digital Library

[12]

Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research (2015).

[13]

Marta Garnelo, Kai Arulkumaran, and Murray Shanahan. 2016. Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518 (2016).

[14]

Vikash Goel, Jameson Weng, and Pascal Poupart. 2018. Unsupervised video object segmentation for deep reinforcement learning. In Advances in Neural Information Processing Systems.

[15]

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. 2019. Omega-Regular Objectives in Model-Free Reinforcement Learning. In TACAS 2019.

[16]

Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-constrained reinforcement learning. arXiv preprint arXiv:1801.08099 (2018).

[17]

Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning. CoRR abs/1801.08099 (2018). arXiv:1801.08099

[18]

Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2019. Certified Reinforcement Learning with Logic Guidance. CoRR abs/1902.00778 (2019). arXiv:1902.00778 http://arxiv.org/abs/1902.00778

[19]

Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2020. Cautious reinforcement learning with logical constraints. arXiv preprint arXiv:2002.12156 (2020).

[20]

Mohammadhosein Hasanbeig, Yiannis Kantaros, Alessand ro Abate, Daniel Kroening, George J. Pappas, and Insup Lee. 2019. Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees. arXiv e-prints, Article arXiv:1909.05304 (Sept. 2019), arXiv:1909.05304 pages. arXiv:1909.05304 [cs.LO]

[21]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[22]

Nathan Hunt, Nathan Fulton, Sara Magliacane, Nghia Hoang, Subhro Das, and Armando Solar-Lezama. 2020. Verifiably Safe Exploration for End-to-End Reinforcement Learning. arXiv:2007.01223 [cs.AI]

[23]

ISO-26262. 2011. International Organization for Standardization 26262 Road vehicles - Functional safety. (2011).

[24]

Nidhi Kalra and Susan M. Paddock. 2016. Driving to Safety: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability? RAND Corporation.

[25]

Torsten Koller, Felix Berkenkamp, Matteo Turchetta, and Andreas Krause. 2018. Learning-based model predictive control for safe exploration. In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 6059--6066.

Digital Library

[26]

Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In European Conference on Computer Vision.

[27]

Yuezhang Li, Katia Sycara, and Rahul Iyer. 2018. Object-sensitive deep reinforcement learning. arXiv preprint arXiv:1809.06064 (2018).

[28]

Junchi Liang and Abdeslam Boularias. 2018. Task-Relevant Object Discovery and Categorization for Playing First-person Shooter Games. arXiv preprint arXiv:1806.06392 (2018).

[29]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In IEEE international conference on computer vision.

[30]

Keting Lu, Shiqi Zhang, Peter Stone, and Xiaoping Chen. 2018. Robot representing and reasoning with knowledge from reinforcement learning. arXiv preprint arXiv:1809.11074 (2018).

[31]

Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In AAAI'19.

Digital Library

[32]

Stefan Mitsch, Khalil Ghorbal, and André Platzer. 2013. On Provably Safe Obstacle Avoidance for Autonomous Robotic Ground Vehicles. In Robotics: Science and Systems, Paul Newman, Dieter Fox, and David Hsu (Eds.).

[33]

Stefan Mitsch and André Platzer. 2016. ModelPlex: Verified Runtime Validation of Verified Cyber-Physical System Models. Form. Methods Syst. Des. 49, 1 (2016), 33--74. Special issue of selected papers from RV'14.

Digital Library

[34]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari With Deep Reinforcement Learning. In NIPS Deep Learning Workshop.

[35]

Dung Phan, Nicola Paoletti, Radu Grosu, Nils Jansen, Scott A. Smolka, and Scott D. Stoller. 2019. Neural Simplex Architecture. (2019).

[36]

André Platzer. 2008. Differential Dynamic Logic for Hybrid Systems. J. Autom. Reas. 41, 2 (2008), 143--189.

Digital Library

[37]

André Platzer. 2010. Logical Analysis of Hybrid Systems: Proving Theorems for Complex Dynamics. Springer, Heidelberg.

[38]

André Platzer. 2012. Logics of Dynamical Systems. In LICS. IEEE, 13--24.

Digital Library

[39]

André Platzer. 2015. A Uniform Substitution Calculus for Differential Dynamic Logic. In CADE.

[40]

André Platzer. 2017. A Complete Uniform Substitution Calculus for Differential Dynamic Logic. J. Autom. Reas. 59, 2 (2017), 219--266.

Digital Library

[41]

André Platzer and Edmund M. Clarke. 2007. The Image Computation Problem in Hybrid Systems Model Checking. In HSCC (LNCS, Vol. 4416), Alberto Bemporad, Antonio Bicchi, and Giorgio Buttazzo (Eds.). Springer, 473--486.

[42]

Jan-David Quesel, Stefan Mitsch, Sarah M. Loos, Nikos Arechiga, and André Platzer. 2016. How to model and prove hybrid systems with KeYmaera: a tutorial on safety. STTT 18, 1 (2016), 67--91.

Digital Library

[43]

Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking Safe Exploration in Deep Reinforcement Learning. (2019).

[44]

John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. 2015. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015) (JMLR Workshop and Conference Proceedings, Vol. 37), Francis R. Bach and David M. Blei (Eds.). 1889--1897.

[45]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347

[46]

Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.

Digital Library

[47]

Weiming Xiang, Patrick Musau, Ayana A Wild, Diego Manzanas Lopez, Nathaniel Hamilton, Xiaodong Yang, Joel Rosenfeld, and Taylor T Johnson. 2018. Verification for machine learning, autonomy, and neural networks survey. arXiv (2018).

[48]

Fangkai Yang, Steven Gustafson, Alexander Elkholy, Daoming Lyu, and Bo Liu. 2019. Program Search for Machine Learning Pipelines Leveraging Symbolic Planning and Reinforcement Learning. In Genetic Programming Theory and Practice XVI.

[49]

Fangkai Yang, Daoming Lyu, Bo Liu, and Steven Gustafson. 2018. Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. arXiv preprint arXiv:1804.07779 (2018).

Digital Library

[50]

Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. 2019. Objects as Points. arXiv preprint arXiv:1904.07850 (2019).

Cited By

Wang YZhu H(2024)Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned ModelsComputer Aided Verification10.1007/978-3-031-65633-0_11(232-255)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-3-031-65633-0_11
Mannucci Tde Oliveira Filho J(2023)Runtime Verification of Learning Properties for Reinforcement Learning AlgorithmsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.395.15395(205-219)Online publication date: 15-Nov-2023
https://doi.org/10.4204/EPTCS.395.15
Gu SKshirsagar ADu YChen GPeters JKnoll A(2023)A human-centered safe robot reinforcement learning framework with interactive behaviorsFrontiers in Neurorobotics10.3389/fnbot.2023.128034117Online publication date: 9-Nov-2023
https://doi.org/10.3389/fnbot.2023.1280341
Show More Cited By

Index Terms

Verifiably safe exploration for end-to-end reinforcement learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
    2. Machine learning approaches
      1. Markov decision processes
      2. Neural networks
2. Theory of computation
  1. Logic
    1. Logic and verification

Recommendations

Reward Shaping from Hybrid Systems Models in Reinforcement Learning
NASA Formal Methods
Abstract
Reinforcement learning is increasingly often used as a learning technique to implement control tasks in autonomous systems. To meet stringent safety requirements, formal methods for learning-enabled systems, such as closed-loop neural network ...
Verifiably Safe Off-Model Reinforcement Learning
Tools and Algorithms for the Construction and Analysis of Systems
Abstract
The desire to use reinforcement learning in safety-critical settings has inspired a recent interest in formal methods for learning algorithms. Existing formal methods for learning and optimization primarily consider the problem of constrained ...
Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Abstract
Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HSCC '21: Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control

May 2021

300 pages

ISBN:9781450383394

DOI:10.1145/3447928

Program Chairs:
Sergiy Bogomolov
Newcastle University, UK
,
Raphaël Jungers
UCLouvain, Belgium

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Results Reproduced / v1.1

Author Tags

Qualifiers

Research-article

Conference

HSCC '21

Sponsor:

SIGBED

HSCC '21: 24th ACM International Conference on Hybrid Systems: Computation and Control

May 19 - 21, 2021

Tennessee, Nashville

Acceptance Rates

HSCC '21 Paper Acceptance Rate 27 of 77 submissions, 35%;

Overall Acceptance Rate 153 of 373 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,464
Total Downloads

Downloads (Last 12 months)372
Downloads (Last 6 weeks)17

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YZhu H(2024)Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned ModelsComputer Aided Verification10.1007/978-3-031-65633-0_11(232-255)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-3-031-65633-0_11
Mannucci Tde Oliveira Filho J(2023)Runtime Verification of Learning Properties for Reinforcement Learning AlgorithmsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.395.15395(205-219)Online publication date: 15-Nov-2023
https://doi.org/10.4204/EPTCS.395.15
Gu SKshirsagar ADu YChen GPeters JKnoll A(2023)A human-centered safe robot reinforcement learning framework with interactive behaviorsFrontiers in Neurorobotics10.3389/fnbot.2023.128034117Online publication date: 9-Nov-2023
https://doi.org/10.3389/fnbot.2023.1280341
Hamilton NDunlap KJohnson THobbs K(2023)Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT56444.2023.00014(45-55)Online publication date: Jul-2023
https://doi.org/10.1109/SMC-IT56444.2023.00014
Kochdumper NKrasowski HWang XBak SAlthoff M(2023)Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial ZonotopesIEEE Open Journal of Control Systems10.1109/OJCSYS.2023.32563052(79-92)Online publication date: 2023
https://doi.org/10.1109/OJCSYS.2023.3256305
Thumm JPelat GAlthoff M(2023)Reducing Safety Interventions in Provably Safe Reinforcement Learning2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS55552.2023.10342464(7515-7522)Online publication date: 1-Oct-2023
https://doi.org/10.1109/IROS55552.2023.10342464
Modares ASadati NModares H(2023)Data‐driven safe gain‐scheduling controlAsian Journal of Control10.1002/asjc.316925:6(4171-4182)Online publication date: 25-Jun-2023
https://doi.org/10.1002/asjc.3169
Daftry SAbcouwer NSesto TVenkatraman SSong JIgel LByon ARosolia UYue YOno M(2022)MLNav: Learning to Safely Navigate on Martian TerrainsIEEE Robotics and Automation Letters10.1109/LRA.2022.31566547:2(5461-5468)Online publication date: Apr-2022
https://doi.org/10.1109/LRA.2022.3156654
Qiu DDong ZZhang XWang YStrbac G(2022)Safe reinforcement learning for real-time automatic control in a smart energy-hubApplied Energy10.1016/j.apenergy.2021.118403309(118403)Online publication date: Mar-2022
https://doi.org/10.1016/j.apenergy.2021.118403
Könighofer BRudolf JPalmisano ATappler MBloem R(2022)Online shielding for reinforcement learningInnovations in Systems and Software Engineering10.1007/s11334-022-00480-419:4(379-394)Online publication date: 23-Sep-2022
https://dl.acm.org/doi/10.1007/s11334-022-00480-4
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents