Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3535850.3535996acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article
Public Access

Controller Synthesis for Omega-Regular and Steady-State Specifications

Published: 09 May 2022 Publication History

Abstract

Given a Markov decision process (MDP) and a linear-time ($ømega$-regular or Linear Temporal Logic) specification which reasons about the infinite-trace behavior of a system, the controller synthesis problem aims to compute the optimal policy that satisfies said specification. Recently, problems that reason over the complementary infinite-frequency behavior of systems have been proposed through the lens of steady-state planning or steady-state policy synthesis. This entails finding a control policy for an MDP such that the Markov chain induced by the solution policy satisfies a given set of constraints on its steady-state distribution. This paper studies a generalization of the controller synthesis problem for a linear-time specification under steady-state constraints on the asymptotic behavior of the agent. We present an algorithm to find a deterministic policy satisfying $ømega$-regular and steady-state constraints by characterizing the solutions as an integer linear program, and experimentally evaluate our approach.

References

[1]
Florent Teichteil-Königsbuch. Path-constrained markov decision processes: bridging the gap between probabilistic model-checking and decision-theoretic planning. In 20th European Conference on Artificial Intelligence (ECAI 2012), 2012.
[2]
Alvaro Velasquez, Ashutosh Trivedi, Ismail Alkhouri, Andre Beckus, and George Atia. Controller synthesis for omega-regular and steady-state specifications. arXiv preprint arXiv:2106.02951, 2021.
[3]
Dmitry Krass and O. J. Vrieze. Achieving target state-action frequencies in multichain average-reward Markov decision processes. Mathematics of Operations Research, 27(3):545--566, 2002.
[4]
George K Atia, Andre Beckus, Ismail Alkhouri, and Alvaro Velasquez. Steady-state planning in expected reward multichain mdps. Journal of Artificial Intelligence Research, 72:1029--1082, 2021.
[5]
Vasanth Sarathy, Daniel Kasenberg, Shivam Goel, Jivko Sinapov, and Matthias Scheutz. Spotter: Extending symbolic planning operators through targeted reinforcement learning. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 1118--1126, 2021.
[6]
Jan Kvr et'inskỳ. Ltl-constrained steady-state policy synthesis. arXiv preprint arXiv:2105.14894, 2021.
[7]
James R Norris and James Robert Norris. Markov chains. Number 2. Cambridge university press, 1998.
[8]
Martin Puterman. Markov decision processes : discrete stochastic dynamic programming. Wiley, New York, 1994.
[9]
Patricia Bouyer, Nicolas Markey, and Raj Mohan Matteplackel. Averaging in ltl. In International Conference on Concurrency Theory, pages 266--280. Springer, 2014.
[10]
Shaull Almagor, Udi Boker, and Orna Kupferman. Discounting in ltl. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 424--439. Springer, 2014.
[11]
Benedikt Bollig, Normann Decker, and Martin Leucker. Frequency linear-time temporal logic. In 2012 Sixth International Symposium on Theoretical Aspects of Software Engineering, pages 85--92. IEEE, 2012.
[12]
Felipe Trevizan, Sylvie Thiébaux, Pedro Santana, and Brian Williams. I-dual: solving constrained ssps via heuristic search in the dual space. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 4954--4958, 2017.
[13]
Peter Baumgartner, Sylvie Thiébaux, and Felipe Trevizan. Heuristic search planning with multi-objective probabilistic ltl constraints. In Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, 2018.
[14]
Felipe Trevizan, Sylvie Thiébaux, and Patrik Haslum. Occupation measure heuristics for probabilistic planning. In Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017.
[15]
Felipe Trevizan, Sylvie Thiébaux, Pedro Santana, and Brian Williams. Heuristic search in dual space for constrained stochastic shortest path problems. In Twenty-Sixth International Conference on Automated Planning and Scheduling, 2016.
[16]
Takashi Tomita, Shin Hiura, Shigeki Hagihara, and Naoki Yonezaki. A temporal logic with mean-payoff constraints. In International Conference on Formal Engineering Methods, pages 249--265. Springer, 2012.
[17]
Udi Boker, Krishnendu Chatterjee, Thomas A Henzinger, and Orna Kupferman. Temporal specifications with accumulative values. ACM Transactions on Computational Logic (TOCL), 15(4):1--25, 2014.
[18]
Yushan Chen, Jana Tumova, and Calin Belta. Ltl robot motion control based on automata learning of environmental dynamics. In 2012 IEEE International Conference on Robotics and Automation, pages 5177--5182. IEEE, 2012.
[19]
Xu Chu Ding, Stephen L Smith, Calin Belta, and Daniela Rus. Mdp optimal control under temporal logic constraints. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 532--538. IEEE, 2011.
[20]
Mária Svorevn ová, Ivana vC erná, and Calin Belta. Optimal receding horizon control for finite deterministic systems with temporal logic constraints. In 2013 American Control Conference, pages 4399--4404. IEEE, 2013.
[21]
Mária Svorevn ová, Jana Tumova, Jivr 'i Barnat, and Ivana vC erná. Attraction-based receding horizon path planning with temporal logic constraints. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pages 6749--6754. IEEE, 2012.
[22]
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
[23]
Felipe W Trevizan, Sylvie Thiébaux, and Patrik Haslum. Occupation measure heuristics for probabilistic planning. In ICAPS, pages 306--315, 2017.
[24]
Bruno Lacerda, David Parker, and Nick Hawes. Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1511--1516. IEEE, 2014.
[25]
Jan Kvr et'inskỳ, Zuzana Kvr et'inská, and Krishnendu Chatterjee. Unifying two views on multiple mean-payoff objectives in markov decision processes. Logical Methods in Computer Science, 13, 2017.
[26]
Anton'in Kuvc era, Vojtve ch Forejt, Krishnendu Chatterjee, Václav Brovz ek, and Tomávs Brázdil. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science, 10, 2014.
[27]
L. C. M. Kallenberg. Linear programming and finite Markovian control problems. Mathematisch Centrum, Amsterdam, 1983.
[28]
Shalabh Bhatnagar and K Lakshmanan. An online actor--critic algorithm with function approximation for constrained Markov decision processes. Journal of Optimization Theory and Applications, 153(3):688--708, 2012.
[29]
K Lakshmanan and Shalabh Bhatnagar. A novel Q-learning algorithm with function approximation for constrained Markov decision processes. In 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 400--405. IEEE, 2012.
[30]
E. A. Feinberg. Adaptive computation of optimal nonrandomized policies in constrained average-reward MDPs. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pages 96--100, March 2009.
[31]
Eitan Altman, Said Boularouk, and Didier Josselin. Constrained Markov decision processes with total expected cost criteria. In Proceedings of the 12th EAI International Conference on Performance Evaluation Methodologies and Tools, pages 191--192. ACM, 2019.
[32]
Eitan Altman. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical Methods of Operations Research, 48(3):387--417, 1998.
[33]
Peter Buchholz. Exact and ordinary lumpability in finite markov chains. Journal of applied probability, pages 59--75, 1994.
[34]
Ushio Sumita and Maria Rieders. Lumpability and time reversibility in the aggregation-disaggregation method for large markov chains. Stochastic Models, 5(1):63--81, 1989.
[35]
Mária Svorevn ová, Ivana vC erná, and Calin Belta. Optimal control of mdps with temporal logic constraints. In 52nd IEEE Conference on Decision and Control, pages 3938--3943. IEEE, 2013.
[36]
Mihalis Yannakakis, Moshe Y Vardi, Marta Kwiatkowska, and Kousha Etessami. Multi-objective model checking of markov decision processes. Logical Methods in Computer Science, 4, 2008.
[37]
Kousha Etessami, Marta Kwiatkowska, Moshe Y Vardi, and Mihalis Yannakakis. Multi-objective model checking of markov decision processes. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 50--65. Springer, 2007.
[38]
Peter Baumgartner, Sylvie Thiébaux, and Felipe Trevizan. Heuristic search planning with multi-objective probabilistic ltl constraints. In Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, 2018.
[39]
Felipe Trevizan, Sylvie Thiébaux, and Patrik Haslum. Occupation measure heuristics for probabilistic planning. In Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017.
[40]
Vojtve ch Forejt, Marta Kwiatkowska, Gethin Norman, and David Parker. Automated verification techniques for probabilistic systems. In International School on Formal Methods for the Design of Computer, Communication and Software Systems, pages 53--113. Springer, 2011.
[41]
Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT press, 2008.
[42]
Vijay Anand Korthikanti, Mahesh Viswanathan, Gul Agha, and YoungMin Kwon. Reasoning about mdps as transformers of probability distributions. In 2010 Seventh International Conference on the Quantitative Evaluation of Systems, pages 199--208. IEEE, 2010.
[43]
Luca De Alfaro. Formal verification of probabilistic systems. Number 1601. Citeseer, 1997.
[44]
Xu Chu Dennis Ding, Stephen L Smith, Calin Belta, and Daniela Rus. Ltl control in uncertain environments with probabilistic satisfaction guarantees. IFAC Proceedings Volumes, 44(1):3515--3520, 2011.
[45]
Sean Meyn. Control techniques for complex networks. Cambridge University Press, 2008.
[46]
William McCuaig. Intercyclic digraphs. Contemporary Mathematics, 147:203--203, 1993.
[47]
John N Tsitsiklis. Np-hardness of checking the unichain condition in average cost mdps. Operations research letters, 35(3):319--323, 2007.
[48]
Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2--3):209--232, 2002.
[49]
Takis Konstantopoulos. Markov chains and random walks. Lecture notes, 2009.
[50]
Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278--287, 1999.
[51]
Alberto Camacho, Oscar Chen, Scott Sanner, and Sheila A McIlraith. Non-markovian rewards expressed in ltl: Guiding search via reward shaping. In Proceedings of the Tenth International Symposium on Combinatorial Search (SoCS), pages 159--160, 2017.
[52]
Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. Reinforcement learning for ltlf/ldlf goals. arXiv preprint arXiv:1807.06333, 2018.
[53]
Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, and Sergey Levine. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1--8. IEEE, 2018.
[54]
Florin C Ghesu, Bogdan Georgescu, Tommaso Mansi, Dominik Neumann, Joachim Hornegger, and Dorin Comaniciu. An artificial agent for anatomical landmark detection in medical images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 229--237. Springer, 2016.
[55]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
[56]
Erhan Cinlar. Introduction to stochastic processes. Courier Corporation, 2013.
[57]
Tanguy Chouard. The go files: Ai computer wraps up 4--1 victory against human champion. Nature News, 2016.
[58]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
[59]
David Gunning. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, 2017.
[60]
Stuart Russell, Daniel Dewey, and Max Tegmark. Research priorities for robust and beneficial artificial intelligence. Ai Magazine, 36(4):105--114, 2015.
[61]
Sundararaman Akshay, Nathalie Bertrand, Serge Haddad, and Loic Helouet. The steady-state control problem for markov decision processes. In International Conference on Quantitative Evaluation of Systems, pages 290--304. Springer, 2013.
[62]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
[63]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.
[64]
David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83--98, 2013.
[65]
Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. arXiv preprint arXiv:1810.00950, 2018.
[66]
Edmund M Clarke, Thomas A Henzinger, Helmut Veith, and Roderick P Bloem. Handbook of model checking. Springer, 2016.
[67]
Sumit Kumar Jha and Christopher James Langmead. Exploring behaviors of stochastic differential equation models of biological systems using change of measures. BMC bioinformatics, 13(5):S8, 2012.
[68]
W. Thomas. Handbook of Theoretical Computer Science, chapter Automata on Infinite Objects, pages 133--191. The MIT Press/łinebreak[0]Elsevier, 1990.
[69]
D. Perrin and J.-É. Pin. Infinite Words: Automata, Semigroups, Logic and Games. Elsevier, 2004.
[70]
E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang. Lazy probabilistic model checking without determinisation. In Concurrency Theory, (CONCUR), pages 354--367, 2015.
[71]
S. Sickert and J. Kvretínský. MoChiBA: Probabilistic LTL model checking using limit-deterministic Büchi automata. In Automated Technology for Verification and Analysis (ATVA), pages 130--137, 2016. LNCS 9938.
[72]
L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford University, 1998.
[73]
George Atia, Andre Beckus, Ismail Alkhouri, and Alvaro Velasquez. Steady-state policy synthesis in multichain markov decision processes. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 4069--4075. International Joint Conferences on Artificial Intelligence Organization, 2020.
[74]
Alvaro Velasquez. Steady-state policy synthesis for verifiable control. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 5653--5661. International Joint Conferences on Artificial Intelligence Organization, 2019.
[75]
K. Chatterjee and M. Henzinger. Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In Symposium on Discrete Algorithms (SODA), pages 1318--1336, January 2011.
[76]
Lodewijk Kallenberg. Linear Programming and Finite Markovian Control Problems. PhD thesis, 06 1980.
[77]
Martin L. Puterman. Markov Decision Processes. Wiley, 1994.
[78]
ILOG, Inc. ILOG CPLEX: High-performance software for mathematical programming and optimization, 2006. See http://www.ilog.com/products/cplex/.

Cited By

View all
  • (2024)Poster Abstract: MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3652535(1-2)Online publication date: 14-May-2024
  • (2024)MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3650135(1-7)Online publication date: 14-May-2024

Index Terms

  1. Controller Synthesis for Omega-Regular and Steady-State Specifications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems
    May 2022
    1990 pages
    ISBN:9781450392136

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 09 May 2022

    Check for updates

    Author Tags

    1. average reward
    2. constrained MDPs
    3. controller synthesis
    4. correct-by-construction
    5. expected reward
    6. linear temporal logic
    7. multichain MDPs
    8. omega-regular
    9. planning for deterministic actions
    10. steady-state

    Qualifiers

    • Research-article

    Funding Sources

    • Air Force Office of Scientific Research
    • National Science Foundation

    Conference

    AAMAS ' 22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Poster Abstract: MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3652535(1-2)Online publication date: 14-May-2024
    • (2024)MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3650135(1-7)Online publication date: 14-May-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media