research-article

Value iteration for simple stochastic games: : Stopping criterion and learning algorithm

Authors:

Julia Eisentraut,

Jan Křetínský,

Maximilian WeiningerAuthors Info & Claims

Volume 285, Issue PB

https://doi.org/10.1016/j.ic.2022.104886

Published: 01 May 2022 Publication History

Abstract

The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations of the value of the game, but is only guaranteed to converge in the limit. We provide an additional converging sequence of over-approximations, based on an analysis of the game graph. Together, these two sequences entail the first error bound and hence the first stopping criterion for VI on simple stochastic games, indicating when the algorithm can be stopped for a given precision. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. We further use this error bound to provide a learning-based asynchronous VI algorithm; it uses simulations and thus often avoids exploring the whole game graph, but still yields the same guarantees. Finally, we experimentally show that the overhead for computing the additional sequence of over-approximations often is negligible.

References

[1]

D. Andersson, P.B. Miltersen, The complexity of solving stochastic games on graphs, in: ISAAC, Springer, 2009, pp. 112–121,.

Digital Library

[2]

G. Arslan, S. Yüksel, Decentralized Q-learning for stochastic teams and games, IEEE Trans. Autom. Control 62 (2017) 1545–1558,.

[3]

P. Ashok, K. Chatterjee, P. Daca, J. Kretínský, T. Meggendorfer, Value iteration for long-run average reward in Markov decision processes, in: CAV (1), Springer, 2017, pp. 201–221,.

[4]

P. Ashok, K. Chatterjee, J. Kretínský, M. Weininger, T. Winkler, Approximating values of generalized-reachability stochastic games, in: LICS, ACM, 2020, pp. 102–115,.

Digital Library

[5]

P. Ashok, P. Daca, J. Kretínský, M. Weininger, Statistical model checking: black or white?, in: ISoLA (1), Springer, 2020, pp. 331–349,.

Digital Library

[6]

P. Ashok, J. Kretínský, M. Weininger, PAC statistical model checking for Markov decision processes and stochastic games, in: CAV (1), Springer, 2019, pp. 497–519,.

[7]

C. Baier, J. Katoen, Principles of Model Checking, MIT Press, 2008.

Digital Library

[8]

C. Baier, J. Klein, L. Leuschner, D. Parker, S. Wunderlich, Ensuring the reliability of your model checker: interval iteration for Markov decision processes, in: CAV (1), Springer, 2017, pp. 160–180,.

[9]

N. Balaji, S. Kiefer, P. Novotný, G.A. Pérez, M. Shirmohammadi, On the complexity of value iteration, in: ICALP, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019, pp. 102:1–102:15,.

[10]

R.I. Brafman, M. Tennenholtz, A near-optimal polynomial time algorithm for learning in certain classes of stochastic games, Artif. Intell. 121 (2000) 31–47,.

Digital Library

[11]

T. Brázdil, K. Chatterjee, M. Chmelik, V. Forejt, J. Kretínský, M.Z. Kwiatkowska, D. Parker, M. Ujma, Verification of Markov decision processes using learning algorithms, in: ATVA, Springer, 2014, pp. 98–114,.

[12]

L. Busoniu, R. Babuska, B.D. Schutter, A comprehensive survey of multiagent reinforcement learning, IEEE Trans. Syst. Man Cybern. Part C 38 (2008) 156–172,.

Digital Library

[13]

R. Calinescu, S. Kikuchi, K. Johnson, Compositional reverification of probabilistic safety properties for large-scale complex IT systems, in: Monterey Workshop, Springer, 2012, pp. 303–329,.

Digital Library

[14]

J. Cámara, G.A. Moreno, D. Garlan, Stochastic game analysis and latency awareness for proactive self-adaptation, in: SEAMS, ACM, 2014, pp. 155–164,.

Digital Library

[15]

K. Chatterjee, L. de Alfaro, T.A. Henzinger, Strategy improvement for concurrent reachability and turn-based stochastic safety games, J. Comput. Syst. Sci. 79 (2013) 640–657,.

Digital Library

[16]

K. Chatterjee, N. Fijalkow, A reduction from parity games to simple stochastic games, in: GandALF, 2011, pp. 74–86,.

[17]

K. Chatterjee, M. Henzinger, Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification, in: SODA, SIAM, 2011, pp. 1318–1336,.

[18]

K. Chatterjee, T.A. Henzinger, Value iteration, in: O. Grumberg, H. Veith (Eds.), 25 Years of Model Checking - History, Achievements, Perspectives, Springer, 2008, pp. 107–138,.

Digital Library

[19]

K. Chatterjee, T.A. Henzinger, B. Jobstmann, A. Radhakrishna, Gist: a solver for probabilistic games, in: CAV, 2010, pp. 665–669,.

Digital Library

[20]

T. Chen, V. Forejt, M.Z. Kwiatkowska, D. Parker, A. Simaitis, Automatic verification of competitive stochastic systems, Form. Methods Syst. Des. 43 (2013) 61–92,.

[21]

T. Chen, M.Z. Kwiatkowska, D. Parker, A. Simaitis, Verifying team formation protocols with probabilistic model checking, in: CLIMA, Springer, 2011, pp. 190–207,.

[22]

T. Chen, M.Z. Kwiatkowska, A. Simaitis, C. Wiltsche, Synthesis for multi-objective stochastic games: an application to autonomous urban driving, in: QEST, 2013, pp. 322–337,.

Digital Library

[23]

C. Cheng, A.C. Knoll, M. Luttenberger, C. Buckl, GAVS+: an open platform for the research of algorithmic game solving, in: TACAS, Springer, 2011, pp. 258–261,.

[24]

A. Condon, The complexity of stochastic games, Inf. Comput. 96 (1992) 203–224,.

Digital Library

[25]

A. Condon, On algorithms for simple stochastic games, in: Advances in Computational Complexity Theory, DIMACS/AMS, 1993, pp. 51–71,.

[26]

P. Daca, T.A. Henzinger, J. Kretínský, T. Petrov, Faster statistical model checking for unbounded temporal properties, ACM Trans. Comput. Log. 18 (2017) 12:1–12:25,.

Digital Library

[27]

B.A. Davey, H.A. Priestley, Introduction to Lattices and Order, Cambridge University Press, 2002.

[28]

C. Dehnert, S. Junges, J. Katoen, M. Volk, A storm is coming: a modern probabilistic model checker, in: CAV (2), Springer, 2017, pp. 592–600,.

[29]

T. van Dijk, Attracting tangles to solve parity games, in: CAV (2), Springer, 2018, pp. 198–215,.

[30]

Eisentraut, J.; Kretínský, J.; Rotar, A. : Stopping criteria for value and strategy iteration on concurrent stochastic reachability games. CoRR arXiv:1909.08348 [abs] (2019): Stopping criteria for value and strategy iteration on concurrent stochastic reachability games. http://arxiv.org/abs/1909.08348.

[31]

L. Feng, M.Z. Kwiatkowska, D. Parker, Automated learning of probabilistic assumptions for compositional reasoning, in: FASE, Springer, 2011, pp. 2–17,.

[32]

J. Filar, K. Vrieze, Competitive Markov Decision Processes, Springer Science & Business Media, 2012.

[33]

S. Haddad, B. Monmege, Interval iteration algorithm for mdps and imdps, Theor. Comput. Sci. 735 (2018) 111–131,.

[34]

E.M. Hahn, A. Hartmanns, C. Hensel, M. Klauck, J. Klein, J. Kretínský, D. Parker, T. Quatmann, E. Ruijters, M. Steinmetz, The 2019 comparison of tools for the analysis of quantitative formal models - (QCOMP 2019 competition report), in: TACAS (3), Springer, 2019, pp. 69–92,.

[35]

A.J. Hoffman, R.M. Karp, On nonterminating stochastic games, Manag. Sci. 12 (1966) 359–370,.

Digital Library

[36]

A. Hordijk, L. Kallenberg, Linear programming and Markov decision chains, Manag. Sci. 25 (1979) 352–362,.

Digital Library

[37]

A. Itai, M. Rodeh, Symmetry breaking in distributed networks, Inf. Comput. 88 (1990) 60–87,.

Digital Library

[38]

M. Kattenbelt, M.Z. Kwiatkowska, G. Norman, D. Parker, A game-based abstraction-refinement framework for Markov decision processes, Form. Methods Syst. Des. 36 (2010) 246–280,.

[39]

E. Kelmendi, J. Krämer, J. Kretínský, M. Weininger, Value iteration for simple stochastic games: stopping criterion and learning algorithm, in: CAV (1), Springer, 2018, pp. 623–642,.

[40]

J. Kretínský, T. Meggendorfer, Efficient strategy iteration for mean payoff in Markov decision processes, in: ATVA, Springer, 2017, pp. 380–399,.

[41]

J. Kretínský, T. Meggendorfer, Of cores: a partial-exploration framework for Markov decision processes, Log. Methods Comput. Sci. 16 (2020) https://lmcs.episciences.org/6833.

[42]

J. Kretínský, E. Ramneantu, A. Slivinskiy, M. Weininger, Comparison of algorithms for simple stochastic games, in: GandALF, 2020, pp. 131–148,.

[43]

M. Kwiatkowska, G. Norman, D. Parker, G. Santos, Prism-games 3.0: stochastic game verification with concurrency, equilibria and time, in: CAV (2), Springer, 2020, pp. 475–487,.

Digital Library

[44]

M.Z. Kwiatkowska, G. Norman, D. Parker, PRISM 4.0: verification of probabilistic real-time systems, in: CAV, Springer, 2011, pp. 585–591,.

[45]

M.Z. Kwiatkowska, G. Norman, D. Parker, The PRISM benchmark suite, in: QEST, IEEE Computer Society, 2012, pp. 203–204,.

Digital Library

[46]

M.Z. Kwiatkowska, G. Norman, D. Parker, J. Sproston, Performance analysis of probabilistic timed automata using digital clocks, Form. Methods Syst. Des. 29 (2006) 33–78,.

Digital Library

[47]

M.Z. Kwiatkowska, G. Norman, J. Sproston, Probabilistic model checking of the IEEE 802.11 wireless local area network protocol, in: PAPM-PROBMIV, Springer, 2002, pp. 169–187,.

[48]

M.Z. Kwiatkowska, G. Norman, J. Sproston, Probabilistic model checking of deadline properties in the IEEE 1394 firewire root contention protocol, Form. Asp. Comput. 14 (2003) 295–318,.

Digital Library

[49]

S.M. LaValle, Robot motion planning: a game-theoretic foundation, Algorithmica 26 (2000) 430–465,.

[50]

J. Li, W. Liu, A novel heuristic Q-learning algorithm for solving stochastic games, in: IJCNN, 2008, pp. 1135–1144,.

[51]

H.B. McMahan, M. Likhachev, G.J. Gordon, Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees, in: ICML, ACM, 2005, pp. 569–576,.

Digital Library

[52]

K. Phalakarn, T. Takisaka, T. Haas, I. Hasuo, Widest paths and global propagation in bounded value iteration for stochastic games, in: CAV (2), Springer, 2020, pp. 349–371,.

Digital Library

[53]

M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Statistics, Wiley, 1994,.

[54]

F. Saffre, A. Simaitis, Host selection through collective decision, ACM Trans. Auton. Adapt. Syst. 7 (2012) 4:1–4:16,.

Digital Library

[55]

A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML, ACM, 2006, pp. 881–888,.

Digital Library

[56]

M. Svorenová, M. Kwiatkowska, Quantitative verification and strategy synthesis for stochastic games, Eur. J. Control 30 (2016) 15–30,.

[57]

A. Tcheukam, H. Tembine, One swarm per queen: a particle swarm learning for stochastic games, in: SASO, 2016, pp. 144–145,.

[58]

M. Ujma, On verification and controller synthesis for probabilistic systems at runtime, Ph.D. thesis University of Oxford, UK, 2015, http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.711811.

[59]

L.G. Valiant, A theory of the learnable, Commun. ACM 27 (1984) 1134–1142,.

Digital Library

[60]

O. Vrieze, S. Tijs, T.E. Raghavan, J. Filar, A finite algorithm for the switching control stochastic game, OR Spektrum 5 (1983) 15–24,.

Digital Library

[61]

M. Wen, U. Topcu, Probably approximately correct learning in stochastic games with temporal logic specifications, in: IJCAI, IJCAI/AAAI Press, 2016, pp. 3630–3636. http://www.ijcai.org/Abstract/16/511.

Cited By

Hartmanns AKohlen BLammich P(2024)Efficient Formally Verified Maximal End Component Decomposition for MDPsFormal Methods10.1007/978-3-031-71162-6_11(206-225)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71162-6_11
Meggendorfer TWeininger M(2024)Playing Games with Your PET: Extending the Partial Exploration Tool to Stochastic GamesComputer Aided Verification10.1007/978-3-031-65633-0_16(359-372)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-65633-0_16
Azeem MEvangelidis AKřetínský JSlivinskiy AWeininger M(2022)Optimistic and Topological Value Iteration for Simple Stochastic GamesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_18(285-302)Online publication date: 25-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19992-9_18

Index Terms

Value iteration for simple stochastic games: Stopping criterion and learning algorithm

Index terms have been assigned to the content through auto-classification.

Recommendations

Comparison of algorithms for simple stochastic games
Abstract
Simple stochastic games are turn-based 2½-player zero-sum graph games with a reachability objective. The problem is to compute the winning probabilities as well as the optimal strategies of both players. In this paper, we compare the ...
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems

We establish a bound on the convergence time of the value iteration algorithm on stochastic shortest-path problems. The bound, which applies for admissible initial vectors as, for example, <monospace>J\equiv 0</monospace>, implies a polynomial-time ...
Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes

We study the general approach to accelerating the convergence of the most widely used solution method of Markov decision processes (MDPs) with the total expected discounted reward. Inspired by the monotone behavior of the contraction mappings in the ...

Comments

Information & Contributors

Information

Published In

cover image Information and Computation

Information and Computation Volume 285, Issue PB

May 2022

998 pages

ISSN:0890-5401

Issue’s Table of Contents

The Authors.

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 May 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hartmanns AKohlen BLammich P(2024)Efficient Formally Verified Maximal End Component Decomposition for MDPsFormal Methods10.1007/978-3-031-71162-6_11(206-225)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71162-6_11
Meggendorfer TWeininger M(2024)Playing Games with Your PET: Extending the Partial Exploration Tool to Stochastic GamesComputer Aided Verification10.1007/978-3-031-65633-0_16(359-372)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-65633-0_16
Azeem MEvangelidis AKřetínský JSlivinskiy AWeininger M(2022)Optimistic and Topological Value Iteration for Simple Stochastic GamesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_18(285-302)Online publication date: 25-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19992-9_18

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents