Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Value iteration for simple stochastic games: : Stopping criterion and learning algorithm

Published: 01 May 2022 Publication History

Abstract

The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations of the value of the game, but is only guaranteed to converge in the limit. We provide an additional converging sequence of over-approximations, based on an analysis of the game graph. Together, these two sequences entail the first error bound and hence the first stopping criterion for VI on simple stochastic games, indicating when the algorithm can be stopped for a given precision. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. We further use this error bound to provide a learning-based asynchronous VI algorithm; it uses simulations and thus often avoids exploring the whole game graph, but still yields the same guarantees. Finally, we experimentally show that the overhead for computing the additional sequence of over-approximations often is negligible.

References

[1]
D. Andersson, P.B. Miltersen, The complexity of solving stochastic games on graphs, in: ISAAC, Springer, 2009, pp. 112–121,.
[2]
G. Arslan, S. Yüksel, Decentralized Q-learning for stochastic teams and games, IEEE Trans. Autom. Control 62 (2017) 1545–1558,.
[3]
P. Ashok, K. Chatterjee, P. Daca, J. Kretínský, T. Meggendorfer, Value iteration for long-run average reward in Markov decision processes, in: CAV (1), Springer, 2017, pp. 201–221,.
[4]
P. Ashok, K. Chatterjee, J. Kretínský, M. Weininger, T. Winkler, Approximating values of generalized-reachability stochastic games, in: LICS, ACM, 2020, pp. 102–115,.
[5]
P. Ashok, P. Daca, J. Kretínský, M. Weininger, Statistical model checking: black or white?, in: ISoLA (1), Springer, 2020, pp. 331–349,.
[6]
P. Ashok, J. Kretínský, M. Weininger, PAC statistical model checking for Markov decision processes and stochastic games, in: CAV (1), Springer, 2019, pp. 497–519,.
[7]
C. Baier, J. Katoen, Principles of Model Checking, MIT Press, 2008.
[8]
C. Baier, J. Klein, L. Leuschner, D. Parker, S. Wunderlich, Ensuring the reliability of your model checker: interval iteration for Markov decision processes, in: CAV (1), Springer, 2017, pp. 160–180,.
[9]
N. Balaji, S. Kiefer, P. Novotný, G.A. Pérez, M. Shirmohammadi, On the complexity of value iteration, in: ICALP, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019, pp. 102:1–102:15,.
[10]
R.I. Brafman, M. Tennenholtz, A near-optimal polynomial time algorithm for learning in certain classes of stochastic games, Artif. Intell. 121 (2000) 31–47,.
[11]
T. Brázdil, K. Chatterjee, M. Chmelik, V. Forejt, J. Kretínský, M.Z. Kwiatkowska, D. Parker, M. Ujma, Verification of Markov decision processes using learning algorithms, in: ATVA, Springer, 2014, pp. 98–114,.
[12]
L. Busoniu, R. Babuska, B.D. Schutter, A comprehensive survey of multiagent reinforcement learning, IEEE Trans. Syst. Man Cybern. Part C 38 (2008) 156–172,.
[13]
R. Calinescu, S. Kikuchi, K. Johnson, Compositional reverification of probabilistic safety properties for large-scale complex IT systems, in: Monterey Workshop, Springer, 2012, pp. 303–329,.
[14]
J. Cámara, G.A. Moreno, D. Garlan, Stochastic game analysis and latency awareness for proactive self-adaptation, in: SEAMS, ACM, 2014, pp. 155–164,.
[15]
K. Chatterjee, L. de Alfaro, T.A. Henzinger, Strategy improvement for concurrent reachability and turn-based stochastic safety games, J. Comput. Syst. Sci. 79 (2013) 640–657,.
[16]
K. Chatterjee, N. Fijalkow, A reduction from parity games to simple stochastic games, in: GandALF, 2011, pp. 74–86,.
[17]
K. Chatterjee, M. Henzinger, Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification, in: SODA, SIAM, 2011, pp. 1318–1336,.
[18]
K. Chatterjee, T.A. Henzinger, Value iteration, in: O. Grumberg, H. Veith (Eds.), 25 Years of Model Checking - History, Achievements, Perspectives, Springer, 2008, pp. 107–138,.
[19]
K. Chatterjee, T.A. Henzinger, B. Jobstmann, A. Radhakrishna, Gist: a solver for probabilistic games, in: CAV, 2010, pp. 665–669,.
[20]
T. Chen, V. Forejt, M.Z. Kwiatkowska, D. Parker, A. Simaitis, Automatic verification of competitive stochastic systems, Form. Methods Syst. Des. 43 (2013) 61–92,.
[21]
T. Chen, M.Z. Kwiatkowska, D. Parker, A. Simaitis, Verifying team formation protocols with probabilistic model checking, in: CLIMA, Springer, 2011, pp. 190–207,.
[22]
T. Chen, M.Z. Kwiatkowska, A. Simaitis, C. Wiltsche, Synthesis for multi-objective stochastic games: an application to autonomous urban driving, in: QEST, 2013, pp. 322–337,.
[23]
C. Cheng, A.C. Knoll, M. Luttenberger, C. Buckl, GAVS+: an open platform for the research of algorithmic game solving, in: TACAS, Springer, 2011, pp. 258–261,.
[24]
A. Condon, The complexity of stochastic games, Inf. Comput. 96 (1992) 203–224,.
[25]
A. Condon, On algorithms for simple stochastic games, in: Advances in Computational Complexity Theory, DIMACS/AMS, 1993, pp. 51–71,.
[26]
P. Daca, T.A. Henzinger, J. Kretínský, T. Petrov, Faster statistical model checking for unbounded temporal properties, ACM Trans. Comput. Log. 18 (2017) 12:1–12:25,.
[27]
B.A. Davey, H.A. Priestley, Introduction to Lattices and Order, Cambridge University Press, 2002.
[28]
C. Dehnert, S. Junges, J. Katoen, M. Volk, A storm is coming: a modern probabilistic model checker, in: CAV (2), Springer, 2017, pp. 592–600,.
[29]
T. van Dijk, Attracting tangles to solve parity games, in: CAV (2), Springer, 2018, pp. 198–215,.
[30]
Eisentraut, J.; Kretínský, J.; Rotar, A. : Stopping criteria for value and strategy iteration on concurrent stochastic reachability games. CoRR arXiv:1909.08348 [abs] (2019): Stopping criteria for value and strategy iteration on concurrent stochastic reachability games. http://arxiv.org/abs/1909.08348.
[31]
L. Feng, M.Z. Kwiatkowska, D. Parker, Automated learning of probabilistic assumptions for compositional reasoning, in: FASE, Springer, 2011, pp. 2–17,.
[32]
J. Filar, K. Vrieze, Competitive Markov Decision Processes, Springer Science & Business Media, 2012.
[33]
S. Haddad, B. Monmege, Interval iteration algorithm for mdps and imdps, Theor. Comput. Sci. 735 (2018) 111–131,.
[34]
E.M. Hahn, A. Hartmanns, C. Hensel, M. Klauck, J. Klein, J. Kretínský, D. Parker, T. Quatmann, E. Ruijters, M. Steinmetz, The 2019 comparison of tools for the analysis of quantitative formal models - (QCOMP 2019 competition report), in: TACAS (3), Springer, 2019, pp. 69–92,.
[35]
A.J. Hoffman, R.M. Karp, On nonterminating stochastic games, Manag. Sci. 12 (1966) 359–370,.
[36]
A. Hordijk, L. Kallenberg, Linear programming and Markov decision chains, Manag. Sci. 25 (1979) 352–362,.
[37]
A. Itai, M. Rodeh, Symmetry breaking in distributed networks, Inf. Comput. 88 (1990) 60–87,.
[38]
M. Kattenbelt, M.Z. Kwiatkowska, G. Norman, D. Parker, A game-based abstraction-refinement framework for Markov decision processes, Form. Methods Syst. Des. 36 (2010) 246–280,.
[39]
E. Kelmendi, J. Krämer, J. Kretínský, M. Weininger, Value iteration for simple stochastic games: stopping criterion and learning algorithm, in: CAV (1), Springer, 2018, pp. 623–642,.
[40]
J. Kretínský, T. Meggendorfer, Efficient strategy iteration for mean payoff in Markov decision processes, in: ATVA, Springer, 2017, pp. 380–399,.
[41]
J. Kretínský, T. Meggendorfer, Of cores: a partial-exploration framework for Markov decision processes, Log. Methods Comput. Sci. 16 (2020) https://lmcs.episciences.org/6833.
[42]
J. Kretínský, E. Ramneantu, A. Slivinskiy, M. Weininger, Comparison of algorithms for simple stochastic games, in: GandALF, 2020, pp. 131–148,.
[43]
M. Kwiatkowska, G. Norman, D. Parker, G. Santos, Prism-games 3.0: stochastic game verification with concurrency, equilibria and time, in: CAV (2), Springer, 2020, pp. 475–487,.
[44]
M.Z. Kwiatkowska, G. Norman, D. Parker, PRISM 4.0: verification of probabilistic real-time systems, in: CAV, Springer, 2011, pp. 585–591,.
[45]
M.Z. Kwiatkowska, G. Norman, D. Parker, The PRISM benchmark suite, in: QEST, IEEE Computer Society, 2012, pp. 203–204,.
[46]
M.Z. Kwiatkowska, G. Norman, D. Parker, J. Sproston, Performance analysis of probabilistic timed automata using digital clocks, Form. Methods Syst. Des. 29 (2006) 33–78,.
[47]
M.Z. Kwiatkowska, G. Norman, J. Sproston, Probabilistic model checking of the IEEE 802.11 wireless local area network protocol, in: PAPM-PROBMIV, Springer, 2002, pp. 169–187,.
[48]
M.Z. Kwiatkowska, G. Norman, J. Sproston, Probabilistic model checking of deadline properties in the IEEE 1394 firewire root contention protocol, Form. Asp. Comput. 14 (2003) 295–318,.
[49]
S.M. LaValle, Robot motion planning: a game-theoretic foundation, Algorithmica 26 (2000) 430–465,.
[50]
J. Li, W. Liu, A novel heuristic Q-learning algorithm for solving stochastic games, in: IJCNN, 2008, pp. 1135–1144,.
[51]
H.B. McMahan, M. Likhachev, G.J. Gordon, Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees, in: ICML, ACM, 2005, pp. 569–576,.
[52]
K. Phalakarn, T. Takisaka, T. Haas, I. Hasuo, Widest paths and global propagation in bounded value iteration for stochastic games, in: CAV (2), Springer, 2020, pp. 349–371,.
[53]
M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Statistics, Wiley, 1994,.
[54]
F. Saffre, A. Simaitis, Host selection through collective decision, ACM Trans. Auton. Adapt. Syst. 7 (2012) 4:1–4:16,.
[55]
A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML, ACM, 2006, pp. 881–888,.
[56]
M. Svorenová, M. Kwiatkowska, Quantitative verification and strategy synthesis for stochastic games, Eur. J. Control 30 (2016) 15–30,.
[57]
A. Tcheukam, H. Tembine, One swarm per queen: a particle swarm learning for stochastic games, in: SASO, 2016, pp. 144–145,.
[58]
M. Ujma, On verification and controller synthesis for probabilistic systems at runtime, Ph.D. thesis University of Oxford, UK, 2015, http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.711811.
[59]
L.G. Valiant, A theory of the learnable, Commun. ACM 27 (1984) 1134–1142,.
[60]
O. Vrieze, S. Tijs, T.E. Raghavan, J. Filar, A finite algorithm for the switching control stochastic game, OR Spektrum 5 (1983) 15–24,.
[61]
M. Wen, U. Topcu, Probably approximately correct learning in stochastic games with temporal logic specifications, in: IJCAI, IJCAI/AAAI Press, 2016, pp. 3630–3636. http://www.ijcai.org/Abstract/16/511.

Cited By

View all
  • (2024)Efficient Formally Verified Maximal End Component Decomposition for MDPsFormal Methods10.1007/978-3-031-71162-6_11(206-225)Online publication date: 9-Sep-2024
  • (2024)Playing Games with Your PET: Extending the Partial Exploration Tool to Stochastic GamesComputer Aided Verification10.1007/978-3-031-65633-0_16(359-372)Online publication date: 24-Jul-2024
  • (2022)Optimistic and Topological Value Iteration for Simple Stochastic GamesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_18(285-302)Online publication date: 25-Oct-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information and Computation
Information and Computation  Volume 285, Issue PB
May 2022
998 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 May 2022

Author Tags

  1. Probabilistic verification
  2. Stochastic games
  3. Markov decision processes
  4. Value iteration
  5. Reachability

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Formally Verified Maximal End Component Decomposition for MDPsFormal Methods10.1007/978-3-031-71162-6_11(206-225)Online publication date: 9-Sep-2024
  • (2024)Playing Games with Your PET: Extending the Partial Exploration Tool to Stochastic GamesComputer Aided Verification10.1007/978-3-031-65633-0_16(359-372)Online publication date: 24-Jul-2024
  • (2022)Optimistic and Topological Value Iteration for Simple Stochastic GamesAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_18(285-302)Online publication date: 25-Oct-2022

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media