Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3564246.3585220acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Public Access

The Smoothed Complexity of Policy Iteration for Markov Decision Processes

Published: 02 June 2023 Publication History

Abstract

We show subexponential lower bounds (i.e., 2Ω (nc)) on the smoothed complexity of the classical Howard’s Policy Iteration algorithm for Markov Decision Processes. The bounds hold for the total reward and the average reward criteria. The constructions are robust in the sense that the subexponential bound holds not only on the average for independent random perturbations of the MDP parameters (transition probabilities and rewards), but for all arbitrary perturbations within an inverse polynomial range. We show also an exponential lower bound on the worst-case complexity for the simple reachability objective.

References

[1]
Omer Angel, Sébastien Bubeck, Yuval Peres, and Fan Wei. 2017. Local max-cut in smoothed polynomial time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. 429–437.
[2]
D. Avis and O. Friedmann. 2017. An exponential lower bound for Cunningham’s rule. Math. Program., 161, 1-2 (2017), 271–305.
[3]
Christel Baier, Luca de Alfaro, Vojtech Forejt, and Marta Kwiatkowska. 2018. Model Checking Probabilistic Systems. In Handbook of Model Checking, Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). Springer, 963–999.
[4]
R. Bellman. 1957. Dynamic Programming. Princeton University Press.
[5]
Ali Bibak, Charles Carlson, and Karthekeyan Chandrasekaran. 2021. Improving the Smoothed Complexity of FLIP for Max Cut Problems. ACM Trans. Algorithms, 17, 3 (2021), Article 19, July, 38 pages. issn:1549-6325 https://doi.org/10.1145/3454125
[6]
Xi Chen, Chenghao Guo, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Mihalis Yannakakis, and Xinzhi Zhang. 2020. Smoothed complexity of local Max-Cut and binary Max-CSP. In Proceedings of the 52th Annual ACM SIGACT Symposium on Theory of Computing.
[7]
Costas Courcoubetis and Mihalis Yannakakis. 1995. The Complexity of Probabilistic Verification. J. ACM, 42, 4 (1995), 857–907.
[8]
Costas Courcoubetis and Mihalis Yannakakis. 1998. Markov decision processes and regular events. IEEE Trans. Autom. Control., 43, 10 (1998), 1399–1418.
[9]
Daniel Dadush and Sophie Huiberts. 2020. A Friendly Smoothed Analysis of the Simplex Method. SIAM J. Comput., 49, 5 (2020).
[10]
C. Derman. 1972. Finite State Markov Decision Processes. Academic Press.
[11]
Amit Deshpande and Daniel A. Spielman. 2005. Improved Smoothed Analysis of the Shadow Vertex Simplex Method. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS. IEEE Computer Society, 349–356.
[12]
Yann Disser, Oliver Friedmann, and Alexander V Hopp. 2022. An exponential lower bound for Zadeh’s pivot rule. Mathematical Programming, 1–72.
[13]
Matthias Englert, Heiko Roglin, and Berthold Vocking. 2016. Smoothed Analysis of the 2-Opt Algorithm for the General TSP. ACM Transactions on Algorithms, 13, 1 (2016).
[14]
Michael Etscheid and Heiko Röglin. 2017. Smoothed Analysis of Local Search for the Maximum-Cut Problem. ACM Trans. Algorithms, 13, 2 (2017), 25:1–25:12.
[15]
John Fearnley. 2010. Exponential lower bounds for policy iteration. In International Colloquium on Automata, Languages, and Programming. 551–562.
[16]
John Fearnley and Rahul Savani. 2015. The complexity of the simplex method. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 201–208.
[17]
Oliver Friedmann, Thomas Dueholm Hansen, and Uri Zwick. 2011. Subexponential lower bounds for randomized pivoting rules for the simplex algorithm. In Proceedings of the forty-third annual ACM symposium on Theory of computing. 283–292.
[18]
T. Hansen, P. Miltersen, and U. Zwick. 2013. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor. J. ACM, 60, 1 (2013), 1:1–1:16.
[19]
Thomas Dueholm Hansen. 2012. Worst-case analysis of strategy iteration and the simplex method. Ph. D. Dissertation. Department Office Computer Science, Aarhus University.
[20]
Romain Hollanders, Jean-Charles Delvenne, and Raphaël M Jungers. 2012. The complexity of policy iteration is exponential for discounted Markov decision processes. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC). 5997–6002.
[21]
Romain Hollanders, Balázs Gerencsér, Jean-Charles Delvenne, and Raphaël M. Jungers. 2016. Improved bound on the worst case complexity of Policy Iteration. Oper. Res. Lett., 44, 2 (2016), 267–272.
[22]
R. Howard. 1960. Dynamic Programming and Markov Processes. MIT Press.
[23]
G. S. Lueker. 1975. Unpublished manuscript. Princeton University
[24]
Mary Melekopoglou and Anne Condon. 1994. On the complexity of the policy improvement algorithm for Markov decision processes. ORSA Journal on Computing, 6, 2 (1994), 188–192.
[25]
M. Puterman. 1994. Markov Decision Processes. Wiley.
[26]
Alejandro A Schäffer and Mihalis Yannakakis. 1991. Simple local search problems that are hard to solve. SIAM J. Comput., 20, 1 (1991), 56–87.
[27]
Bruno Scherrer. 2013. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems. 386–394.
[28]
Daniel A Spielman. 2002. The Behavior of Algorithms in Practice: Lecture 14. Scribe: Brian Sutton. http://www.cs.yale.edu/homes/spielman/BAP/lect14.pdf
[29]
Daniel A. Spielman and Shang-Hua Teng. 2009. Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Commun. ACM, 52, 10 (2009), 76–84.
[30]
Daniel A Spielman and Shang-Hua Teng. 2004. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51, 3 (2004), 385–463.
[31]
Meet Taraviya and Shivaram Kalyanakrishnan. 2019. A Tighter Analysis of Randomised Policy Iteration. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI (Proceedings of Machine Learning Research, Vol. 115). AUAI Press, 519–529.
[32]
Moshe Y. Vardi. 1985. Automatic Verification of Probabilistic Concurrent Finite-State Programs. In 26th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, 327–338.
[33]
Yue Wu and Jesús A. De Loera. 2022. Geometric Policy Iteration for Markov Decision Processes. In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.
[34]
Y. Ye. 2011. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate. Mathematics of Operations Research, 36, 4 (2011), 593–603.

Index Terms

  1. The Smoothed Complexity of Policy Iteration for Markov Decision Processes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing
    June 2023
    1926 pages
    ISBN:9781450399135
    DOI:10.1145/3564246
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Policy Iteration
    2. Smoothed Analysis

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    STOC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 159
      Total Downloads
    • Downloads (Last 12 months)142
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media