Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3447928.3456653acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article
Open access

Verifiably safe exploration for end-to-end reinforcement learning

Published: 19 May 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We characterize safety constraints in terms of a refinement relation on Markov decision processes - rather than directly constraining the reinforcement learning algorithm so that it only takes safe actions, we instead refine the environment so that only safe actions are defined in the environment's transition structure. This has pragmatic system design benefits and, more importantly, provides a clean conceptual setting in which we are able to prove important safety and efficiency properties. These allow us to transform the constrained optimization problem of acting safely in the original environment into an unconstrained optimization in a refined environment.

    References

    [1]
    Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In International Conference on Machine Learning (ICML 2017) (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 22--31.
    [2]
    Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe Reinforcement Learning via Shielding. In AAAI Conference on Artificial Intelligence.
    [3]
    Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, and Andreas Krause. 2017. Safe model-based reinforcement learning with stability guarantees. In Advances in neural information processing systems. 908--918.
    [4]
    Richard Cheng, Gábor Orosz, Richard M Murray, and Joel W Burdick. 2019. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3387--3395.
    [5]
    Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). 2018. Handbook of Model Checking. Springer.
    [6]
    Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, and Yuval Tassa. 2018. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757 (2018).
    [7]
    Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. 2019. Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications. In International Conference on Automated Planning and Scheduling (ICAPS 2019).
    [8]
    Nathan Fulton, Stefan Mitsch, Brandon Bohrer, and André Platzer. 2017. Bellerophon: Tactical Theorem Proving for Hybrid Systems. In International Conference on Interactive Theorem Proving.
    [9]
    Nathan Fulton, Stefan Mitsch, Jan-David Quesel, Marcus Völp, and André Platzer. 2015. KeYmaera X: An Axiomatic Tactical Theorem Prover for Hybrid Systems. In CADE.
    [10]
    Nathan Fulton and André Platzer. 2018. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning. In AAAI Conference on Artificial Intelligence.
    [11]
    Nathan Fulton and André Platzer. 2019. Verifiably Safe Off-Model Reinforcement Learning. In TACAS 2019 (Lecture Notes in Computer Science, Vol. 11427), Tomás Vojnar and Lijun Zhang (Eds.). Springer, 413--430.
    [12]
    Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research (2015).
    [13]
    Marta Garnelo, Kai Arulkumaran, and Murray Shanahan. 2016. Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518 (2016).
    [14]
    Vikash Goel, Jameson Weng, and Pascal Poupart. 2018. Unsupervised video object segmentation for deep reinforcement learning. In Advances in Neural Information Processing Systems.
    [15]
    Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. 2019. Omega-Regular Objectives in Model-Free Reinforcement Learning. In TACAS 2019.
    [16]
    Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-constrained reinforcement learning. arXiv preprint arXiv:1801.08099 (2018).
    [17]
    Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning. CoRR abs/1801.08099 (2018). arXiv:1801.08099
    [18]
    Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2019. Certified Reinforcement Learning with Logic Guidance. CoRR abs/1902.00778 (2019). arXiv:1902.00778 http://arxiv.org/abs/1902.00778
    [19]
    Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2020. Cautious reinforcement learning with logical constraints. arXiv preprint arXiv:2002.12156 (2020).
    [20]
    Mohammadhosein Hasanbeig, Yiannis Kantaros, Alessand ro Abate, Daniel Kroening, George J. Pappas, and Insup Lee. 2019. Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees. arXiv e-prints, Article arXiv:1909.05304 (Sept. 2019), arXiv:1909.05304 pages. arXiv:1909.05304 [cs.LO]
    [21]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
    [22]
    Nathan Hunt, Nathan Fulton, Sara Magliacane, Nghia Hoang, Subhro Das, and Armando Solar-Lezama. 2020. Verifiably Safe Exploration for End-to-End Reinforcement Learning. arXiv:2007.01223 [cs.AI]
    [23]
    ISO-26262. 2011. International Organization for Standardization 26262 Road vehicles - Functional safety. (2011).
    [24]
    Nidhi Kalra and Susan M. Paddock. 2016. Driving to Safety: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability? RAND Corporation.
    [25]
    Torsten Koller, Felix Berkenkamp, Matteo Turchetta, and Andreas Krause. 2018. Learning-based model predictive control for safe exploration. In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 6059--6066.
    [26]
    Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In European Conference on Computer Vision.
    [27]
    Yuezhang Li, Katia Sycara, and Rahul Iyer. 2018. Object-sensitive deep reinforcement learning. arXiv preprint arXiv:1809.06064 (2018).
    [28]
    Junchi Liang and Abdeslam Boularias. 2018. Task-Relevant Object Discovery and Categorization for Playing First-person Shooter Games. arXiv preprint arXiv:1806.06392 (2018).
    [29]
    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In IEEE international conference on computer vision.
    [30]
    Keting Lu, Shiqi Zhang, Peter Stone, and Xiaoping Chen. 2018. Robot representing and reasoning with knowledge from reinforcement learning. arXiv preprint arXiv:1809.11074 (2018).
    [31]
    Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In AAAI'19.
    [32]
    Stefan Mitsch, Khalil Ghorbal, and André Platzer. 2013. On Provably Safe Obstacle Avoidance for Autonomous Robotic Ground Vehicles. In Robotics: Science and Systems, Paul Newman, Dieter Fox, and David Hsu (Eds.).
    [33]
    Stefan Mitsch and André Platzer. 2016. ModelPlex: Verified Runtime Validation of Verified Cyber-Physical System Models. Form. Methods Syst. Des. 49, 1 (2016), 33--74. Special issue of selected papers from RV'14.
    [34]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari With Deep Reinforcement Learning. In NIPS Deep Learning Workshop.
    [35]
    Dung Phan, Nicola Paoletti, Radu Grosu, Nils Jansen, Scott A. Smolka, and Scott D. Stoller. 2019. Neural Simplex Architecture. (2019).
    [36]
    André Platzer. 2008. Differential Dynamic Logic for Hybrid Systems. J. Autom. Reas. 41, 2 (2008), 143--189.
    [37]
    André Platzer. 2010. Logical Analysis of Hybrid Systems: Proving Theorems for Complex Dynamics. Springer, Heidelberg.
    [38]
    André Platzer. 2012. Logics of Dynamical Systems. In LICS. IEEE, 13--24.
    [39]
    André Platzer. 2015. A Uniform Substitution Calculus for Differential Dynamic Logic. In CADE.
    [40]
    André Platzer. 2017. A Complete Uniform Substitution Calculus for Differential Dynamic Logic. J. Autom. Reas. 59, 2 (2017), 219--266.
    [41]
    André Platzer and Edmund M. Clarke. 2007. The Image Computation Problem in Hybrid Systems Model Checking. In HSCC (LNCS, Vol. 4416), Alberto Bemporad, Antonio Bicchi, and Giorgio Buttazzo (Eds.). Springer, 473--486.
    [42]
    Jan-David Quesel, Stefan Mitsch, Sarah M. Loos, Nikos Arechiga, and André Platzer. 2016. How to model and prove hybrid systems with KeYmaera: a tutorial on safety. STTT 18, 1 (2016), 67--91.
    [43]
    Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking Safe Exploration in Deep Reinforcement Learning. (2019).
    [44]
    John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. 2015. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015) (JMLR Workshop and Conference Proceedings, Vol. 37), Francis R. Bach and David M. Blei (Eds.). 1889--1897.
    [45]
    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347
    [46]
    Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
    [47]
    Weiming Xiang, Patrick Musau, Ayana A Wild, Diego Manzanas Lopez, Nathaniel Hamilton, Xiaodong Yang, Joel Rosenfeld, and Taylor T Johnson. 2018. Verification for machine learning, autonomy, and neural networks survey. arXiv (2018).
    [48]
    Fangkai Yang, Steven Gustafson, Alexander Elkholy, Daoming Lyu, and Bo Liu. 2019. Program Search for Machine Learning Pipelines Leveraging Symbolic Planning and Reinforcement Learning. In Genetic Programming Theory and Practice XVI.
    [49]
    Fangkai Yang, Daoming Lyu, Bo Liu, and Steven Gustafson. 2018. Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. arXiv preprint arXiv:1804.07779 (2018).
    [50]
    Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. 2019. Objects as Points. arXiv preprint arXiv:1904.07850 (2019).

    Cited By

    View all
    • (2024)Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned ModelsComputer Aided Verification10.1007/978-3-031-65633-0_11(232-255)Online publication date: 26-Jul-2024
    • (2023)Runtime Verification of Learning Properties for Reinforcement Learning AlgorithmsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.395.15395(205-219)Online publication date: 15-Nov-2023
    • (2023)A human-centered safe robot reinforcement learning framework with interactive behaviorsFrontiers in Neurorobotics10.3389/fnbot.2023.128034117Online publication date: 9-Nov-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HSCC '21: Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control
    May 2021
    300 pages
    ISBN:9781450383394
    DOI:10.1145/3447928
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 May 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. differential dynamic logic
    2. formal verification
    3. hybrid systems
    4. neural networks
    5. reinforcement learning
    6. safe artificial intelligence

    Qualifiers

    • Research-article

    Conference

    HSCC '21
    Sponsor:

    Acceptance Rates

    HSCC '21 Paper Acceptance Rate 27 of 77 submissions, 35%;
    Overall Acceptance Rate 153 of 373 submissions, 41%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)372
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned ModelsComputer Aided Verification10.1007/978-3-031-65633-0_11(232-255)Online publication date: 26-Jul-2024
    • (2023)Runtime Verification of Learning Properties for Reinforcement Learning AlgorithmsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.395.15395(205-219)Online publication date: 15-Nov-2023
    • (2023)A human-centered safe robot reinforcement learning framework with interactive behaviorsFrontiers in Neurorobotics10.3389/fnbot.2023.128034117Online publication date: 9-Nov-2023
    • (2023)Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT56444.2023.00014(45-55)Online publication date: Jul-2023
    • (2023)Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial ZonotopesIEEE Open Journal of Control Systems10.1109/OJCSYS.2023.32563052(79-92)Online publication date: 2023
    • (2023)Reducing Safety Interventions in Provably Safe Reinforcement Learning2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS55552.2023.10342464(7515-7522)Online publication date: 1-Oct-2023
    • (2023)Data‐driven safe gain‐scheduling controlAsian Journal of Control10.1002/asjc.316925:6(4171-4182)Online publication date: 25-Jun-2023
    • (2022)MLNav: Learning to Safely Navigate on Martian TerrainsIEEE Robotics and Automation Letters10.1109/LRA.2022.31566547:2(5461-5468)Online publication date: Apr-2022
    • (2022)Safe reinforcement learning for real-time automatic control in a smart energy-hubApplied Energy10.1016/j.apenergy.2021.118403309(118403)Online publication date: Mar-2022
    • (2022)Online shielding for reinforcement learningInnovations in Systems and Software Engineering10.1007/s11334-022-00480-419:4(379-394)Online publication date: 23-Sep-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media