Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3635637.3663036acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

PAS: Probably Approximate Safety Verification of Reinforcement Learning Policy Using Scenario Optimization

Published: 06 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    With the advancement of machine learning based automation in the current digital world, the problem of safety verification of such systems is becoming crucial, especially in safety-critical domains like self-driving cars, robotics, etc. Reinforcement learning (RL) is an emerging machine learning technique with many applications, including in safety-critical domains. The classical safety verification approach of making a binary decision on determining whether a system is safe or unsafe is particularly challenging for an RL system. Such an approach generally requires prior knowledge about the system, e.g., the transition model of the system, the set of unsafe states in the environment, etc., which are typically unavailable in a standard RL setting. Instead, this paper addresses the safety verification problem from a quantitative safety perspective, i.e., we quantify the safe behavior of the policy in terms of probability. We formulate the safety verification problem as a chance-constrained optimization using the technique of barrier certificate. We then use a sampling based approach called scenario optimization to solve the chance-constrained problem, which gives the desired probabilistic guarantee on the safe behavior of the policy. Our extensive empirical evaluation shows the validity and robustness of our approach in three RL domains.

    References

    [1]
    Edoardo Bacci and David Parker. 2020. Probabilistic guarantees for safe deep reinforcement learning. In Formal Modeling and Analysis of Timed Systems: 18th International Conference, FORMATS 2020, Vienna, Austria, September 1-3, 2020, Proceedings 18. Springer, 231--248.
    [2]
    Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy P. Lillicrap. 2018. Distributed Distributional Deterministic Policy Gradients. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. Open Review.net.
    [3]
    Andrew G Barto, Richard S Sutton, and Charles W Anderson. 1983. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics 5 (1983), 834--846.
    [4]
    Osbert Bastani, Shuo Li, and Anton Xu. 2021. Safe Reinforcement Learning via Statistical Model Predictive Shielding. In Robotics: Science and Systems. 1--13.
    [5]
    Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable reinforce-ment learning via policy extraction. Advances in neural information processing systems 31 (2018).
    [6]
    Calin Belta, Boyan Yordanov, and Ebru Aydin Gol. 2017. Formal methods for discrete-time dynamical systems. Vol. 89. Springer.
    [7]
    Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, and Andreas Krause. 2017. Safe Model-based Reinforcement Learning with Stability Guarantees. In Advances in Neural Information Processing Systems, December 4-9, 2017, Long Beach, CA, USA. 908--918.
    [8]
    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540
    [9]
    Giuseppe Carlo Calafiore and Marco C Campi. 2006. The scenario approach to robust control design. IEEE Transactions on automatic control 51, 5 (2006), 742--753.
    [10]
    Marco C. Campi and Simone Garatti. 2011. A Sampling-and-Discarding Approach to Chance-Constrained Optimization: Feasibility and Optimality. J. Optim. Theory Appl. 148, 2 (2011), 257--280.
    [11]
    Abraham Charnes and William W Cooper. 1959. Chance-constrained programming. Management science 6, 1 (1959), 73--79.
    [12]
    Richard Cheng, Gábor Orosz, Richard M Murray, and Joel W Burdick. 2019. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 3387--3395.
    [13]
    Gabriel Dulac-Arnold, Nir Levine, Daniel J Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, and Todd Hester. 2021. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning 110, 9 (2021), 2419--2468.
    [14]
    Nathan Fulton and André Platzer. 2018. Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
    [15]
    Akshita Gupta and Inseok Hwang. 2020. Safety Verification of Model Based Reinforcement Learning Controllers. arXiv preprint arXiv:2010.10740 (2020).
    [16]
    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden. PMLR, 1856--1865.
    [17]
    John Jackson, Luca Laurenti, Eric Frew, and Morteza Lahijanian. 2020. Safety verification of unknown dynamical systems via gaussian process regression. In 2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 860--866.
    [18]
    Morteza Lahijanian, Sean B Andersson, and Calin Belta. 2015. Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Automat. Control 60, 8 (2015), 2031--2045.
    [19]
    Matthew Landers and Afsaneh Doryab. 2023. Deep Reinforcement Learning Verification: A Survey. Comput. Surveys (2023).
    [20]
    Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
    [21]
    Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 6379--6390.
    [22]
    Yuping Luo and Tengyu Ma. 2021. Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 25621--25632.
    [23]
    Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.
    [24]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
    [25]
    Andrew William Moore. 1990. Efficient memory-based learning for robot control. Technical Report. University of Cambridge, Computer Laboratory.
    [26]
    Stephen Prajna, Ali Jadbabaie, and George J. Pappas. 2004. Stochastic safety verification using barrier certificates. In 43rd IEEE Conference on Decision and Control, CDC 2004, Nassau, Bahamas, December 14-17, 2004. IEEE, 929--934.
    [27]
    Stephen Prajna and Anders Rantzer. 2005. On the necessity of barrier certificates. IFAC Proceedings Volumes 38, 1 (2005), 526--531.
    [28]
    Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
    [29]
    Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, and Chuchu Fan. 2021. Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
    [30]
    Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22, 268 (2021), 1--8. http://jmlr.org/papers/v22/20-1364.html
    [31]
    John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897.
    [32]
    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017).
    [33]
    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354.
    [34]
    Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134--1142.
    [35]
    Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. 2018. Programmatically interpretable reinforcement learning. In International Conference on Machine Learning. PMLR, 5045--5054.
    [36]
    Jun Zeng, Bike Zhang, and Koushil Sreenath. 2020. Safety-Critical Model Predic-tive Control with Discrete-Time Control Barrier Function. CoRR abs/2007.11718 (2020).

    Index Terms

    1. PAS: Probably Approximate Safety Verification of Reinforcement Learning Policy Using Scenario Optimization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
      May 2024
      2898 pages
      ISBN:9798400704864

      Sponsors

      Publisher

      International Foundation for Autonomous Agents and Multiagent Systems

      Richland, SC

      Publication History

      Published: 06 May 2024

      Check for updates

      Author Tags

      1. reinforcement learning
      2. safety verification
      3. scenario optimization

      Qualifiers

      • Research-article

      Conference

      AAMAS '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 23
        Total Downloads
      • Downloads (Last 12 months)23
      • Downloads (Last 6 weeks)2

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media