research-article

PAS: Probably Approximate Safety Verification of Reinforcement Learning Policy Using Scenario Optimization

Authors:

Arambam James Singh and

Arvind EaswaranAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

Pages 1745 - 1753

Published: 06 May 2024 Publication History

Abstract

With the advancement of machine learning based automation in the current digital world, the problem of safety verification of such systems is becoming crucial, especially in safety-critical domains like self-driving cars, robotics, etc. Reinforcement learning (RL) is an emerging machine learning technique with many applications, including in safety-critical domains. The classical safety verification approach of making a binary decision on determining whether a system is safe or unsafe is particularly challenging for an RL system. Such an approach generally requires prior knowledge about the system, e.g., the transition model of the system, the set of unsafe states in the environment, etc., which are typically unavailable in a standard RL setting. Instead, this paper addresses the safety verification problem from a quantitative safety perspective, i.e., we quantify the safe behavior of the policy in terms of probability. We formulate the safety verification problem as a chance-constrained optimization using the technique of barrier certificate. We then use a sampling based approach called scenario optimization to solve the chance-constrained problem, which gives the desired probabilistic guarantee on the safe behavior of the policy. Our extensive empirical evaluation shows the validity and robustness of our approach in three RL domains.

References

[1]

Edoardo Bacci and David Parker. 2020. Probabilistic guarantees for safe deep reinforcement learning. In Formal Modeling and Analysis of Timed Systems: 18th International Conference, FORMATS 2020, Vienna, Austria, September 1-3, 2020, Proceedings 18. Springer, 231--248.

Digital Library

[2]

Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy P. Lillicrap. 2018. Distributed Distributional Deterministic Policy Gradients. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. Open Review.net.

[3]

Andrew G Barto, Richard S Sutton, and Charles W Anderson. 1983. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics 5 (1983), 834--846.

[4]

Osbert Bastani, Shuo Li, and Anton Xu. 2021. Safe Reinforcement Learning via Statistical Model Predictive Shielding. In Robotics: Science and Systems. 1--13.

[5]

Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable reinforce-ment learning via policy extraction. Advances in neural information processing systems 31 (2018).

[6]

Calin Belta, Boyan Yordanov, and Ebru Aydin Gol. 2017. Formal methods for discrete-time dynamical systems. Vol. 89. Springer.

[7]

Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, and Andreas Krause. 2017. Safe Model-based Reinforcement Learning with Stability Guarantees. In Advances in Neural Information Processing Systems, December 4-9, 2017, Long Beach, CA, USA. 908--918.

[8]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540

[9]

Giuseppe Carlo Calafiore and Marco C Campi. 2006. The scenario approach to robust control design. IEEE Transactions on automatic control 51, 5 (2006), 742--753.

[10]

Marco C. Campi and Simone Garatti. 2011. A Sampling-and-Discarding Approach to Chance-Constrained Optimization: Feasibility and Optimality. J. Optim. Theory Appl. 148, 2 (2011), 257--280.

[11]

Abraham Charnes and William W Cooper. 1959. Chance-constrained programming. Management science 6, 1 (1959), 73--79.

[12]

Richard Cheng, Gábor Orosz, Richard M Murray, and Joel W Burdick. 2019. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 3387--3395.

Digital Library

[13]

Gabriel Dulac-Arnold, Nir Levine, Daniel J Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, and Todd Hester. 2021. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning 110, 9 (2021), 2419--2468.

Digital Library

[14]

Nathan Fulton and André Platzer. 2018. Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[15]

Akshita Gupta and Inseok Hwang. 2020. Safety Verification of Model Based Reinforcement Learning Controllers. arXiv preprint arXiv:2010.10740 (2020).

[16]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden. PMLR, 1856--1865.

[17]

John Jackson, Luca Laurenti, Eric Frew, and Morteza Lahijanian. 2020. Safety verification of unknown dynamical systems via gaussian process regression. In 2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 860--866.

Digital Library

[18]

Morteza Lahijanian, Sean B Andersson, and Calin Belta. 2015. Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Automat. Control 60, 8 (2015), 2031--2045.

[19]

Matthew Landers and Afsaneh Doryab. 2023. Deep Reinforcement Learning Verification: A Survey. Comput. Surveys (2023).

[20]

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.

[21]

Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 6379--6390.

Digital Library

[22]

Yuping Luo and Tengyu Ma. 2021. Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 25621--25632.

[23]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.

[24]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.

[25]

Andrew William Moore. 1990. Efficient memory-based learning for robot control. Technical Report. University of Cambridge, Computer Laboratory.

[26]

Stephen Prajna, Ali Jadbabaie, and George J. Pappas. 2004. Stochastic safety verification using barrier certificates. In 43rd IEEE Conference on Decision and Control, CDC 2004, Nassau, Bahamas, December 14-17, 2004. IEEE, 929--934.

[27]

Stephen Prajna and Anders Rantzer. 2005. On the necessity of barrier certificates. IFAC Proceedings Volumes 38, 1 (2005), 526--531.

[28]

Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

[29]

Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, and Chuchu Fan. 2021. Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.

[30]

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22, 268 (2021), 1--8. http://jmlr.org/papers/v22/20-1364.html

[31]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897.

[32]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017).

[33]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354.

[34]

Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134--1142.

Digital Library

[35]

Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. 2018. Programmatically interpretable reinforcement learning. In International Conference on Machine Learning. PMLR, 5045--5054.

[36]

Jun Zeng, Bike Zhang, and Koushil Sreenath. 2020. Safety-Critical Model Predic-tive Control with Discrete-Time Control Barrier Function. CoRR abs/2007.11718 (2020).

Index Terms

PAS: Probably Approximate Safety Verification of Reinforcement Learning Policy Using Scenario Optimization
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

A safety-critical software design and verification technique
Read More
AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training
AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, ...
Read More
Towards applying a safety analysis and verification method based on STPA to agile software development
CSED '16: Proceedings of the International Workshop on Continuous Software Evolution and Delivery

Agile methodologies are becoming widespread in modern software development. However, due to a lack of safety assurance activities, agile methods are criticized for being inadequate for the development of safe software. Safety analysis and safety ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '23

Sponsor:

SIGAI

AAMAS '23: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
23
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)2

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents