research-article

Public Access

The Smoothed Complexity of Policy Iteration for Markov Decision Processes

Authors:

Miranda Christ,

Mihalis YannakakisAuthors Info & Claims

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

Pages 1890 - 1903

https://doi.org/10.1145/3564246.3585220

Published: 02 June 2023 Publication History

Abstract

We show subexponential lower bounds (i.e., 2^{Ω (n^c)}) on the smoothed complexity of the classical Howard’s Policy Iteration algorithm for Markov Decision Processes. The bounds hold for the total reward and the average reward criteria. The constructions are robust in the sense that the subexponential bound holds not only on the average for independent random perturbations of the MDP parameters (transition probabilities and rewards), but for all arbitrary perturbations within an inverse polynomial range. We show also an exponential lower bound on the worst-case complexity for the simple reachability objective.

References

[1]

Omer Angel, Sébastien Bubeck, Yuval Peres, and Fan Wei. 2017. Local max-cut in smoothed polynomial time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. 429–437.

Digital Library

[2]

D. Avis and O. Friedmann. 2017. An exponential lower bound for Cunningham’s rule. Math. Program., 161, 1-2 (2017), 271–305.

Digital Library

[3]

Christel Baier, Luca de Alfaro, Vojtech Forejt, and Marta Kwiatkowska. 2018. Model Checking Probabilistic Systems. In Handbook of Model Checking, Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). Springer, 963–999.

[4]

R. Bellman. 1957. Dynamic Programming. Princeton University Press.

Digital Library

[5]

Ali Bibak, Charles Carlson, and Karthekeyan Chandrasekaran. 2021. Improving the Smoothed Complexity of FLIP for Max Cut Problems. ACM Trans. Algorithms, 17, 3 (2021), Article 19, July, 38 pages. issn:1549-6325 https://doi.org/10.1145/3454125

Digital Library

[6]

Xi Chen, Chenghao Guo, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Mihalis Yannakakis, and Xinzhi Zhang. 2020. Smoothed complexity of local Max-Cut and binary Max-CSP. In Proceedings of the 52th Annual ACM SIGACT Symposium on Theory of Computing.

Digital Library

[7]

Costas Courcoubetis and Mihalis Yannakakis. 1995. The Complexity of Probabilistic Verification. J. ACM, 42, 4 (1995), 857–907.

Digital Library

[8]

Costas Courcoubetis and Mihalis Yannakakis. 1998. Markov decision processes and regular events. IEEE Trans. Autom. Control., 43, 10 (1998), 1399–1418.

[9]

Daniel Dadush and Sophie Huiberts. 2020. A Friendly Smoothed Analysis of the Simplex Method. SIAM J. Comput., 49, 5 (2020).

Digital Library

[10]

C. Derman. 1972. Finite State Markov Decision Processes. Academic Press.

[11]

Amit Deshpande and Daniel A. Spielman. 2005. Improved Smoothed Analysis of the Shadow Vertex Simplex Method. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS. IEEE Computer Society, 349–356.

[12]

Yann Disser, Oliver Friedmann, and Alexander V Hopp. 2022. An exponential lower bound for Zadeh’s pivot rule. Mathematical Programming, 1–72.

[13]

Matthias Englert, Heiko Roglin, and Berthold Vocking. 2016. Smoothed Analysis of the 2-Opt Algorithm for the General TSP. ACM Transactions on Algorithms, 13, 1 (2016).

[14]

Michael Etscheid and Heiko Röglin. 2017. Smoothed Analysis of Local Search for the Maximum-Cut Problem. ACM Trans. Algorithms, 13, 2 (2017), 25:1–25:12.

Digital Library

[15]

John Fearnley. 2010. Exponential lower bounds for policy iteration. In International Colloquium on Automata, Languages, and Programming. 551–562.

[16]

John Fearnley and Rahul Savani. 2015. The complexity of the simplex method. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 201–208.

Digital Library

[17]

Oliver Friedmann, Thomas Dueholm Hansen, and Uri Zwick. 2011. Subexponential lower bounds for randomized pivoting rules for the simplex algorithm. In Proceedings of the forty-third annual ACM symposium on Theory of computing. 283–292.

Digital Library

[18]

T. Hansen, P. Miltersen, and U. Zwick. 2013. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor. J. ACM, 60, 1 (2013), 1:1–1:16.

Digital Library

[19]

Thomas Dueholm Hansen. 2012. Worst-case analysis of strategy iteration and the simplex method. Ph. D. Dissertation. Department Office Computer Science, Aarhus University.

[20]

Romain Hollanders, Jean-Charles Delvenne, and Raphaël M Jungers. 2012. The complexity of policy iteration is exponential for discounted Markov decision processes. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC). 5997–6002.

[21]

Romain Hollanders, Balázs Gerencsér, Jean-Charles Delvenne, and Raphaël M. Jungers. 2016. Improved bound on the worst case complexity of Policy Iteration. Oper. Res. Lett., 44, 2 (2016), 267–272.

Digital Library

[22]

R. Howard. 1960. Dynamic Programming and Markov Processes. MIT Press.

[23]

G. S. Lueker. 1975. Unpublished manuscript. Princeton University

[24]

Mary Melekopoglou and Anne Condon. 1994. On the complexity of the policy improvement algorithm for Markov decision processes. ORSA Journal on Computing, 6, 2 (1994), 188–192.

[25]

M. Puterman. 1994. Markov Decision Processes. Wiley.

[26]

Alejandro A Schäffer and Mihalis Yannakakis. 1991. Simple local search problems that are hard to solve. SIAM J. Comput., 20, 1 (1991), 56–87.

Digital Library

[27]

Bruno Scherrer. 2013. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems. 386–394.

[28]

Daniel A Spielman. 2002. The Behavior of Algorithms in Practice: Lecture 14. Scribe: Brian Sutton. http://www.cs.yale.edu/homes/spielman/BAP/lect14.pdf

[29]

Daniel A. Spielman and Shang-Hua Teng. 2009. Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Commun. ACM, 52, 10 (2009), 76–84.

Digital Library

[30]

Daniel A Spielman and Shang-Hua Teng. 2004. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51, 3 (2004), 385–463.

Digital Library

[31]

Meet Taraviya and Shivaram Kalyanakrishnan. 2019. A Tighter Analysis of Randomised Policy Iteration. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI (Proceedings of Machine Learning Research, Vol. 115). AUAI Press, 519–529.

[32]

Moshe Y. Vardi. 1985. Automatic Verification of Probabilistic Concurrent Finite-State Programs. In 26th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, 327–338.

[33]

Yue Wu and Jesús A. De Loera. 2022. Geometric Policy Iteration for Markov Decision Processes. In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.

[34]

Y. Ye. 2011. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate. Mathematics of Operations Research, 36, 4 (2011), 593–603.

Digital Library

Index Terms

The Smoothed Complexity of Policy Iteration for Markov Decision Processes
1. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
      1. Combinatorial optimization

Recommendations

Optimally solving Markov decision processes with total expected discounted reward function

Compared computational performance of linear programming and the policy iteration.Considered only discrete-time infinite-horizon MDPs with discounted reward.Used randomly generated test problems and a real-life health-care problem.Showed that, unlike ...
Policy Bounds for Markov Decision Processes

This paper demonstrates how a Markov decision process MDP can be approximated to generate a policy bound, i.e., a function that bounds the optimal policy from below or from above for all states. We present sufficient conditions for several ...
The Complexity of Markov Decision Processes

We investigate the complexity of the classical problem of optimal policy computation in Markov decision processes. All three variants of the problem finite horizon, infinite horizon discounted, and infinite horizon average cost were known to be solvable ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

June 2023

1926 pages

ISBN:9781450399135

DOI:10.1145/3564246

General Chair:
Barna Saha
University of California at San Diego, USA
,
Program Chair:
Rocco A. Servedio
Columbia University, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

STOC '23

Sponsor:

SIGACT

STOC '23: 55th Annual ACM Symposium on Theory of Computing

June 20 - 23, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
159
Total Downloads

Downloads (Last 12 months)142
Downloads (Last 6 weeks)19

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents