research-article

Public Access

Controller Synthesis for Omega-Regular and Steady-State Specifications

Authors:

Alvaro Velasquez,

Ismail Alkhouri,

Ashutosh Trivedi,

George AtiaAuthors Info & Claims

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Pages 1310 - 1318

Published: 09 May 2022 Publication History

Abstract

Given a Markov decision process (MDP) and a linear-time ($ømega$-regular or Linear Temporal Logic) specification which reasons about the infinite-trace behavior of a system, the controller synthesis problem aims to compute the optimal policy that satisfies said specification. Recently, problems that reason over the complementary infinite-frequency behavior of systems have been proposed through the lens of steady-state planning or steady-state policy synthesis. This entails finding a control policy for an MDP such that the Markov chain induced by the solution policy satisfies a given set of constraints on its steady-state distribution. This paper studies a generalization of the controller synthesis problem for a linear-time specification under steady-state constraints on the asymptotic behavior of the agent. We present an algorithm to find a deterministic policy satisfying $ømega$-regular and steady-state constraints by characterizing the solutions as an integer linear program, and experimentally evaluate our approach.

References

[1]

Florent Teichteil-Königsbuch. Path-constrained markov decision processes: bridging the gap between probabilistic model-checking and decision-theoretic planning. In 20th European Conference on Artificial Intelligence (ECAI 2012), 2012.

[2]

Alvaro Velasquez, Ashutosh Trivedi, Ismail Alkhouri, Andre Beckus, and George Atia. Controller synthesis for omega-regular and steady-state specifications. arXiv preprint arXiv:2106.02951, 2021.

[3]

Dmitry Krass and O. J. Vrieze. Achieving target state-action frequencies in multichain average-reward Markov decision processes. Mathematics of Operations Research, 27(3):545--566, 2002.

Digital Library

[4]

George K Atia, Andre Beckus, Ismail Alkhouri, and Alvaro Velasquez. Steady-state planning in expected reward multichain mdps. Journal of Artificial Intelligence Research, 72:1029--1082, 2021.

Digital Library

[5]

Vasanth Sarathy, Daniel Kasenberg, Shivam Goel, Jivko Sinapov, and Matthias Scheutz. Spotter: Extending symbolic planning operators through targeted reinforcement learning. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 1118--1126, 2021.

Digital Library

[6]

Jan Kvr et'inskỳ. Ltl-constrained steady-state policy synthesis. arXiv preprint arXiv:2105.14894, 2021.

[7]

James R Norris and James Robert Norris. Markov chains. Number 2. Cambridge university press, 1998.

[8]

Martin Puterman. Markov decision processes : discrete stochastic dynamic programming. Wiley, New York, 1994.

[9]

Patricia Bouyer, Nicolas Markey, and Raj Mohan Matteplackel. Averaging in ltl. In International Conference on Concurrency Theory, pages 266--280. Springer, 2014.

[10]

Shaull Almagor, Udi Boker, and Orna Kupferman. Discounting in ltl. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 424--439. Springer, 2014.

[11]

Benedikt Bollig, Normann Decker, and Martin Leucker. Frequency linear-time temporal logic. In 2012 Sixth International Symposium on Theoretical Aspects of Software Engineering, pages 85--92. IEEE, 2012.

Digital Library

[12]

Felipe Trevizan, Sylvie Thiébaux, Pedro Santana, and Brian Williams. I-dual: solving constrained ssps via heuristic search in the dual space. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 4954--4958, 2017.

[13]

Peter Baumgartner, Sylvie Thiébaux, and Felipe Trevizan. Heuristic search planning with multi-objective probabilistic ltl constraints. In Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, 2018.

[14]

Felipe Trevizan, Sylvie Thiébaux, and Patrik Haslum. Occupation measure heuristics for probabilistic planning. In Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017.

[15]

Felipe Trevizan, Sylvie Thiébaux, Pedro Santana, and Brian Williams. Heuristic search in dual space for constrained stochastic shortest path problems. In Twenty-Sixth International Conference on Automated Planning and Scheduling, 2016.

Digital Library

[16]

Takashi Tomita, Shin Hiura, Shigeki Hagihara, and Naoki Yonezaki. A temporal logic with mean-payoff constraints. In International Conference on Formal Engineering Methods, pages 249--265. Springer, 2012.

Digital Library

[17]

Udi Boker, Krishnendu Chatterjee, Thomas A Henzinger, and Orna Kupferman. Temporal specifications with accumulative values. ACM Transactions on Computational Logic (TOCL), 15(4):1--25, 2014.

[18]

Yushan Chen, Jana Tumova, and Calin Belta. Ltl robot motion control based on automata learning of environmental dynamics. In 2012 IEEE International Conference on Robotics and Automation, pages 5177--5182. IEEE, 2012.

[19]

Xu Chu Ding, Stephen L Smith, Calin Belta, and Daniela Rus. Mdp optimal control under temporal logic constraints. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 532--538. IEEE, 2011.

[20]

Mária Svorevn ová, Ivana vC erná, and Calin Belta. Optimal receding horizon control for finite deterministic systems with temporal logic constraints. In 2013 American Control Conference, pages 4399--4404. IEEE, 2013.

[21]

Mária Svorevn ová, Jana Tumova, Jivr 'i Barnat, and Ivana vC erná. Attraction-based receding horizon path planning with temporal logic constraints. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pages 6749--6754. IEEE, 2012.

[22]

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.

Digital Library

[23]

Felipe W Trevizan, Sylvie Thiébaux, and Patrik Haslum. Occupation measure heuristics for probabilistic planning. In ICAPS, pages 306--315, 2017.

[24]

Bruno Lacerda, David Parker, and Nick Hawes. Optimal and dynamic planning for markov decision processes with co-safe LTL specifications. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1511--1516. IEEE, 2014.

[25]

Jan Kvr et'inskỳ, Zuzana Kvr et'inská, and Krishnendu Chatterjee. Unifying two views on multiple mean-payoff objectives in markov decision processes. Logical Methods in Computer Science, 13, 2017.

[26]

Anton'in Kuvc era, Vojtve ch Forejt, Krishnendu Chatterjee, Václav Brovz ek, and Tomávs Brázdil. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science, 10, 2014.

[27]

L. C. M. Kallenberg. Linear programming and finite Markovian control problems. Mathematisch Centrum, Amsterdam, 1983.

[28]

Shalabh Bhatnagar and K Lakshmanan. An online actor--critic algorithm with function approximation for constrained Markov decision processes. Journal of Optimization Theory and Applications, 153(3):688--708, 2012.

Digital Library

[29]

K Lakshmanan and Shalabh Bhatnagar. A novel Q-learning algorithm with function approximation for constrained Markov decision processes. In 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 400--405. IEEE, 2012.

[30]

E. A. Feinberg. Adaptive computation of optimal nonrandomized policies in constrained average-reward MDPs. In IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pages 96--100, March 2009.

[31]

Eitan Altman, Said Boularouk, and Didier Josselin. Constrained Markov decision processes with total expected cost criteria. In Proceedings of the 12th EAI International Conference on Performance Evaluation Methodologies and Tools, pages 191--192. ACM, 2019.

Digital Library

[32]

Eitan Altman. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical Methods of Operations Research, 48(3):387--417, 1998.

[33]

Peter Buchholz. Exact and ordinary lumpability in finite markov chains. Journal of applied probability, pages 59--75, 1994.

[34]

Ushio Sumita and Maria Rieders. Lumpability and time reversibility in the aggregation-disaggregation method for large markov chains. Stochastic Models, 5(1):63--81, 1989.

[35]

Mária Svorevn ová, Ivana vC erná, and Calin Belta. Optimal control of mdps with temporal logic constraints. In 52nd IEEE Conference on Decision and Control, pages 3938--3943. IEEE, 2013.

[36]

Mihalis Yannakakis, Moshe Y Vardi, Marta Kwiatkowska, and Kousha Etessami. Multi-objective model checking of markov decision processes. Logical Methods in Computer Science, 4, 2008.

[37]

Kousha Etessami, Marta Kwiatkowska, Moshe Y Vardi, and Mihalis Yannakakis. Multi-objective model checking of markov decision processes. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 50--65. Springer, 2007.

[38]

Peter Baumgartner, Sylvie Thiébaux, and Felipe Trevizan. Heuristic search planning with multi-objective probabilistic ltl constraints. In Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, 2018.

[39]

Felipe Trevizan, Sylvie Thiébaux, and Patrik Haslum. Occupation measure heuristics for probabilistic planning. In Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017.

[40]

Vojtve ch Forejt, Marta Kwiatkowska, Gethin Norman, and David Parker. Automated verification techniques for probabilistic systems. In International School on Formal Methods for the Design of Computer, Communication and Software Systems, pages 53--113. Springer, 2011.

[41]

Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT press, 2008.

Digital Library

[42]

Vijay Anand Korthikanti, Mahesh Viswanathan, Gul Agha, and YoungMin Kwon. Reasoning about mdps as transformers of probability distributions. In 2010 Seventh International Conference on the Quantitative Evaluation of Systems, pages 199--208. IEEE, 2010.

Digital Library

[43]

Luca De Alfaro. Formal verification of probabilistic systems. Number 1601. Citeseer, 1997.

[44]

Xu Chu Dennis Ding, Stephen L Smith, Calin Belta, and Daniela Rus. Ltl control in uncertain environments with probabilistic satisfaction guarantees. IFAC Proceedings Volumes, 44(1):3515--3520, 2011.

[45]

Sean Meyn. Control techniques for complex networks. Cambridge University Press, 2008.

Digital Library

[46]

William McCuaig. Intercyclic digraphs. Contemporary Mathematics, 147:203--203, 1993.

[47]

John N Tsitsiklis. Np-hardness of checking the unichain condition in average cost mdps. Operations research letters, 35(3):319--323, 2007.

[48]

Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2--3):209--232, 2002.

[49]

Takis Konstantopoulos. Markov chains and random walks. Lecture notes, 2009.

[50]

Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278--287, 1999.

Digital Library

[51]

Alberto Camacho, Oscar Chen, Scott Sanner, and Sheila A McIlraith. Non-markovian rewards expressed in ltl: Guiding search via reward shaping. In Proceedings of the Tenth International Symposium on Combinatorial Search (SoCS), pages 159--160, 2017.

[52]

Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. Reinforcement learning for ltlf/ldlf goals. arXiv preprint arXiv:1807.06333, 2018.

[53]

Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, and Sergey Levine. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1--8. IEEE, 2018.

Digital Library

[54]

Florin C Ghesu, Bogdan Georgescu, Tommaso Mansi, Dominik Neumann, Joachim Hornegger, and Dorin Comaniciu. An artificial agent for anatomical landmark detection in medical images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 229--237. Springer, 2016.

Digital Library

[55]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.

[56]

Erhan Cinlar. Introduction to stochastic processes. Courier Corporation, 2013.

[57]

Tanguy Chouard. The go files: Ai computer wraps up 4--1 victory against human champion. Nature News, 2016.

[58]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.

[59]

David Gunning. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, 2017.

[60]

Stuart Russell, Daniel Dewey, and Max Tegmark. Research priorities for robust and beneficial artificial intelligence. Ai Magazine, 36(4):105--114, 2015.

Digital Library

[61]

Sundararaman Akshay, Nathalie Bertrand, Serge Haddad, and Loic Helouet. The steady-state control problem for markov decision processes. In International Conference on Quantitative Evaluation of Systems, pages 290--304. Springer, 2013.

Digital Library

[62]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.

[63]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.

[64]

David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83--98, 2013.

[65]

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. arXiv preprint arXiv:1810.00950, 2018.

[66]

Edmund M Clarke, Thomas A Henzinger, Helmut Veith, and Roderick P Bloem. Handbook of model checking. Springer, 2016.

[67]

Sumit Kumar Jha and Christopher James Langmead. Exploring behaviors of stochastic differential equation models of biological systems using change of measures. BMC bioinformatics, 13(5):S8, 2012.

[68]

W. Thomas. Handbook of Theoretical Computer Science, chapter Automata on Infinite Objects, pages 133--191. The MIT Press/łinebreak[0]Elsevier, 1990.

Digital Library

[69]

D. Perrin and J.-É. Pin. Infinite Words: Automata, Semigroups, Logic and Games. Elsevier, 2004.

[70]

E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang. Lazy probabilistic model checking without determinisation. In Concurrency Theory, (CONCUR), pages 354--367, 2015.

[71]

S. Sickert and J. Kvretínský. MoChiBA: Probabilistic LTL model checking using limit-deterministic Büchi automata. In Automated Technology for Verification and Analysis (ATVA), pages 130--137, 2016. LNCS 9938.

[72]

L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford University, 1998.

Digital Library

[73]

George Atia, Andre Beckus, Ismail Alkhouri, and Alvaro Velasquez. Steady-state policy synthesis in multichain markov decision processes. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 4069--4075. International Joint Conferences on Artificial Intelligence Organization, 2020.

[74]

Alvaro Velasquez. Steady-state policy synthesis for verifiable control. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 5653--5661. International Joint Conferences on Artificial Intelligence Organization, 2019.

[75]

K. Chatterjee and M. Henzinger. Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In Symposium on Discrete Algorithms (SODA), pages 1318--1336, January 2011.

[76]

Lodewijk Kallenberg. Linear Programming and Finite Markovian Control Problems. PhD thesis, 06 1980.

[77]

Martin L. Puterman. Markov Decision Processes. Wiley, 1994.

[78]

ILOG, Inc. ILOG CPLEX: High-performance software for mathematical programming and optimization, 2006. See http://www.ilog.com/products/cplex/.

Cited By

Bals SEvangelidis AKřetínský JWaibel J(2024)Poster Abstract: MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3652535(1-2)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3641513.3652535
Bals SEvangelidis AKřetínský JWaibel J(2024)MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3650135(1-7)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3641513.3650135

Index Terms

Controller Synthesis for Omega-Regular and Steady-State Specifications
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Motion path planning

Recommendations

Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement Learning
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Recent success in reinforcement learning (RL) has brought renewed attention to the design of reward functions by which agent behavior is reinforced or deterred. Manually designing reward functions is tedious and error-prone. An alternative approach is ...
Controller synthesis for linear temporal logic and steady-state specifications
Abstract
The problem of deriving decision-making policies, subject to some formal specification of behavior, has been well-studied in the control synthesis, reinforcement learning, and planning communities. Such problems are typically framed in the context ...
Optimal Deterministic Controller Synthesis from Steady-State Distributions
Abstract
The formal synthesis of control policies is a classic problem that entails the computation of optimal strategies for an agent interacting in some environment such that some formal guarantees of behavior are met. These guarantees are specified as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

May 2022

1990 pages

ISBN:9781450392136

General Chairs:
Catherine Pelachaud
CNRS-ISIR, Sorbonne University, France
,
Matthew E. Taylor
University of Alberta, Canada
,
Program Chairs:
Piotr Faliszewski
AGH University of Science and Technology, Poland
,
Viviana Mascardi
University of Genova, Italy

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2022

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Air Force Office of Scientific Research
National Science Foundation

Conference

AAMAS ' 22

Sponsor:

SIGAI

AAMAS ' 22: International Conference on Autonomous Agents and Multi-Agent Systems

May 9 - 13, 2022

Virtual Event, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
38
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)7

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bals SEvangelidis AKřetínský JWaibel J(2024)Poster Abstract: MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3652535(1-2)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3641513.3652535
Bals SEvangelidis AKřetínský JWaibel J(2024)MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints✱Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control10.1145/3641513.3650135(1-7)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3641513.3650135

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents