Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Complexity of finite-horizon Markov decision process problems

Published: 01 July 2000 Publication History
  • Get Citation Alerts
  • Abstract

    Controlled stochastic systems occur in science engineering, manufacturing, social sciences, and many other cntexts. If the systems is modeled as a Markov decision process (MDP) and will run ad infinitum, the optimal control policy can be computed in polynomial time using linear programming. The problems considered here assume that the time that the process will run is finite, and based on the size of the input. There are mny factors that compound the complexity of computing the optimal policy. For instance, there are many factors that compound the complexity of this computation. For instance, if the controller does not have complete information about the state of the system, or if the system is represented in some very succint manner, the optimal policy is provably not computable in time polynomial in the size of the input. We analyze the computational complexity of evaluating policies and of determining whether a sufficiently good policy exists for a MDP, based on a number of confounding factors, including the observability of the system state; the succinctness of the representation; the type of policy; even the number of actions relative to the number of states. In almost every case, we show that the decision problem is complete for some known complexity class. Some of these results are familiar from work by Papadimitriou and Tsitsiklis and others, but some, such as our PL-completeness proofs, are surprising. We include proofs of completeness for natural problems in the as yet little-studied classes NPPP.

    References

    [1]
    ALLENDER, E., AND OGIHARA, M. 1996. Relationships among PL, #L, and the determinant. RAIRO Theoret. Inf. Appl. 30, 1, 1-21.
    [2]
    ALVAREZ, C., AND JENNER, B. 1993. A very hard log-space counting class. Theoret. Comput. Sci. 107, 3-30.
    [3]
    AOKI, M. 1965. Optimal control of partially observed Markovian systems. J. Franklin Inst. 280, 367-368.
    [4]
    ARROYO-FIGUEROA, G., AND SUCAR, L.E. 1999. A temporal Bayesian network for diagnosis and prediction. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan- Kaufman, San Francisco, Calif.
    [5]
    ASTROM, K. 1965. Optimal control of Markov processes with incomplete state information. J. Math. Analy. Appl. 10, 174-205.
    [6]
    B~CKSTROM, C. 1995. Expressive equivalence of planning formalisms. Artif. Int. 76, 17-34.
    [7]
    BALCAZAR, J., LOZANO, A., AND TORAN, J. 1992. The complexity of algorithmic problems on succinct instances. In Computer Science, R. Baeza-Yates and U. Manber, Eds. Plenum Press, New York, pp. 351-377.
    [8]
    BARRY, P., AND LASKEY, K. B. 1999. An application of uncertain reasoning to requirements engineering. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan- Kaufman, San Francisco, Calif.
    [9]
    BEAUQUIER, D., BURAGO, D., AND SLISSENKO, A. 1995. On the complexity of finite memory policies for Markov decision processes. In Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 969. Springer-Verlag, New York, pp. 191-200.
    [10]
    BEIGEL, R., REINGOLD, N., AND SPIELMAN, D. 1995. PP is closed under intersection. J. Comput. Syst. Sci. 50, 191-202.
    [11]
    BELLMAN, R. 1957. Dynamic Programming. Princeton University Press, Princeton, N.J.
    [12]
    BLONDEL, V., AND TSITSIKLIS, J. 1998. A survey of computational complexity results in systems and control. Available from http://web.mit.edu/jnt/www/publ.html or http://web.mit.edu/-jnt/survey.ps. (postscript, 785K), (to appear in Automatica).
    [13]
    BLYTHE, J. 1999. Decision-theoretic planning. AI Magazine 20, 2, 37-54.
    [14]
    BOMAN, DAVIDSSON, AND YOUNES. 1999. Artificial decision making in intelligent buildings. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [15]
    BOUTILIER, C., DEAN, T., AND HANKS, S. 1995a. Planning under uncertainty: Structural assumptions and computational leverage. In Proceedings of the 2nd European Workshop on Planning. BOUTILIER, C., DEAN, T., AND HANKS, S. 1999a. Decision-theoretic planning: Structural assumptions and computational leverage. J. AIRes. 11, 1-94.
    [16]
    BOUTILIER, C., DEARDEN, R., AND GOLDSZMIDT, M. 1995b. Exploiting structure in policy construction. In Proceedings of the 14th International Conference on AI.
    [17]
    BOUTILIER, C., GOLDSZMIDT, M., AND SABATA, B. 1999b. Continuous value function approximation for sequential bidding policies. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [18]
    BRYANT, R.E. 1991. On the complexity of VLSI implementations and graph representations of boolean functions with application to integer multiplication. IEEE Trans. Comput. C-40, 2, 205-213.
    [19]
    BURAGO, D., DE ROUGEMONT, M., AND SLISSENKO, A. 1996. On the complexity of partially observed Markov decision processes. Theoret. Comput. Sci. 157, 2, 161-183.
    [20]
    BYLANDER, T. 1994. The computational complexity of propositional STRIPS planning. Artif. Int. 69, 165-204.
    [21]
    CASSANDRA, A. 1998. Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Ph.D. dissertation. Brown Univ., Providence, R.I.
    [22]
    CASSANDRA, A., KAELBLING, L., AND LITTMAN, M. 1994. Acting optimally in partially observable stochastic domains. In Proceedings of AAAI-94.
    [23]
    CASSANDRA, A., KAELBLING, L., AND LITTMAN, M. 1995. Efficient dynamic-programming updates in partially observable Markov decision processes. Tech. Rep. CS-95-19. Brown Univ., Providence, e.I.
    [24]
    CASSANDRA, A., LITTMAN, M., AND ZHANG, N. 1997. Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. In Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (UAI-97). D. Geiger and P. P. Shenoy, Eds. Morgan- Kaufmann, San Francisco, Calif., pp. 54-61.
    [25]
    CHAPMAN, D. 1987. Planning for conjunctive goals. Artif. Int. 32, 333-379.
    [26]
    DYNKIN, E. 1965. Controlled random sequences. Theory Prob. Appl. X, 10, 1-14.
    [27]
    EROL, K., HENDLER, J., AND NAU, D. 1996. Complexity results for hierarchical task-network planning. Ann. Math. Artif. Int. 18, 69-93.
    [28]
    EROL, K., NAU, D., AND SUBRAHMANIAN, V. 1995. Complexity, decidability and undecidability results for domain-independent planning. Artif. Int. 76, 75-88.
    [29]
    FENNER, S., FORTNOW, L., AND KURTZ, S. 1994. Gap-definable counting classes. J. Comput. Syst. Sci. 48, 1, 116-148.
    [30]
    GALPERIN, H., AND WIGDERSON, A. 1983. Succinct representation of graphs. Inf. Cont. 56, 183-198.
    [31]
    GOLABI, K., KULKARNI, R., AND WAY, G. 1982. A statewide pavement management system. Interfaces 12, 5-21.
    [32]
    GOLDSMITH, J., LITTMAN, M., AND MUNDHENK, M. 1997. The complexity of plan existence and evaluation in probabilistic domains. In Proceedings of the 13th Conference on Uncertainty in AI. Morgan-Kaufmann, San Francisco, Calif.
    [33]
    GOLDSMITH, J., AND MUNDHENK, M. 1998. Complexity issues in Markov decision processes. In Proceedings of IEEE Conference on Computational Complexity. IEEE Computer Society Press, Los Alamitos, Calif.
    [34]
    HANSEN, E. 1998a. Finite-Memory Control of Partially Observable Systems. Ph.D. dissertation, Dept. of Computer Science, University of Massachusetts at Amherst, Amherst, Massachusetts.
    [35]
    HANSEN, E. 1998b. Solving POMDPs by searching in policy space. In Proceedings of AAAI Conference Uncertainty in AI.
    [36]
    HAUSKRECHT, M. 1997. Planning and Control in Stochastic Domains with Imperfect Information. Ph.D. dissertation. Massachusetts Institute of Technology, Cambridge, Mass.
    [37]
    HOCHBAUM, D. 1997. Approximation Algorithms for NP-Hard Problems. PWS Publishing Company.
    [38]
    HOEY, J., ST.-AUBIN, R., Hu, A., AND BOUTILIER, C. 1999. SPUDD: Stochastic planning using decision diagrams. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif., pp. 279-288.
    [39]
    HORVITZ, E., JACOBS, A., AND HOVEL, D. 1999. Attention-sensitive alerting in computing systems. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [40]
    HOWARD, R. 1960. Dynamic Programming and Markov Processes. MIT Press, Cambridge, Mass. JUNG, H. 1984. On probabilistic tape complexity and fast circuits for matrix inversion problems. In Proceedings of the llth ICALP. Lecture Notes in Computer Science, vol. 172. Springer-Verlag, New York, pp. 281-291.
    [41]
    JUNG, H. 1985. On probabilistic time and space. In Proceedings of the 12th ICALP. Lecture Notes in Computer Science. vol. 194. Springer-Verlag, New York, pp. 281-291.
    [42]
    KORB, K. B., NICHOLSON, A. E., AND JITNAH, N. 1999. Bayesian poker. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [43]
    KUSHMERICK, N., HANKS, S., AND WELD, D. 1995. An algorithm for probabilistic planning. Artif. Int. 76, 239-286.
    [44]
    LADNER, R. 1989. Polynomial space counting problems. SIAM J. Comput. 18, 1087 -1097.
    [45]
    LITTMAN, M. 1996a. Algorithms for Sequential Decision Making. Ph.D. dissertation. Brown Univ., Providence, R.I.
    [46]
    LITTMAN, M. 1996b. Probabilistic STRIPS planning is EXPTIME-complete. Tech. Rep. CS- 1996-18 (November). Dept. Computer Science, Duke Univ.
    [47]
    LITTMAN, M. 1997a. Probabilistic propositional planning: Representations and complexity. In Proceedings of the 14th National Conference on Artificial Intelligence. AAAI Press/The MIT Press, Cambridge, Mass.
    [48]
    LITTMAN, M. 1997b. Probabilistic propositional planning: Representations and complexity. In Proceedings of the 14th National Conference on Artificial Intelligence. AAAI Press/MIT Press, Cambridge, Mass.
    [49]
    LITTMAN, M.L. 1999a. Initial experiments in probabilistic satisfiability. In Proceedings of AAAI-99. In preparation for conference submission.
    [50]
    LITTMAN, M. L. 1999b. Initial experiments in stochastic satisfiability. In Proceedings of the 16th National Conference on Artificial Intelligence. AAAI Press/MIT Press, Cambridge, Mass.
    [51]
    LITTMAN, M., DEAN, T., AND KAELBLING, L. 1995. On the complexity of solving Markov decision problems. In Proceedings of the llth Annual Conference on Uncertainty in Artificial Intelligence. pp. 394-402.
    [52]
    LITTMAN, M., GOLDSMITH, J., AND MUNDHENK, M. 1998. The computational complexity of probabilistic plan existence and evaluation. J. AI Res. 9, 1-36.
    [53]
    LOVEJOY, W. 1991. A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28, 47-66.
    [54]
    LUSENA, C., LI, T., SITTINGER, S., WELLS, C., AND GOLDSMITH, J. 1999. My brain is full: When more memory helps. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. pp. 374-381
    [55]
    ~MADANI, 0., HANKS, S., AND CONDON, A. 1999. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of the 16th National Conference on Artificial Intelligence.
    [56]
    MAJERCIK, M., AND LITTMAN, M. 1998a. Maxplan: A new approach to probabilistic planning. In Artificial Intelligence and Planning Systems. pp. 86-93.
    [57]
    MAJERCIK, S. M., AND LITTMAN, M.L. 1998b. Using caching to solve larger probabilistic planning problems. In Proceedings of 15th National Conference on Artificial Intelligence. pp. 954-959.
    [58]
    MEULEAU, N., KIM, K.-E., KAELBLING, L. P., AND CASSANDRA, A.R. 1999. Solving POMDPs by searching the space of finite policies. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 417-426.
    [59]
    MEULEAU, N., PESHKIN, L., KIM, K.-E., AND KAELBLING, L.P. 1999. Learning finite-state controllers for partially observable environments. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 427-436.
    [60]
    MISLEVY, R. J., ALMOND, R. G., YAN, D., AND STEINBERG, L.S. 1999. Bayes nets in educational assessment: Where the numbers come from. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [61]
    MONAHAN, G. 1982. A survey of partially observable Markov decision processes: Theory, models, and algorithms. Manag. Sci. 28, 1-16.
    [62]
    MUNDHENK, M. 2000. The complexity of optimal small policies. Math. Oper. Res. 25, 1, 118-129.
    [63]
    MUNDHENK, M., GOLDSMITH, J., AND ALLENDER, E. 1997. The complexity of the policy existence problem for partially-observable finite-horizon Markov decision processes. In Proceedings of the 25th Mathematical Foundations of Computer Sciences. Lecture Notes in Computer Science, vol. 1295: Springer-Verlag, New York, pp. 129-138.
    [64]
    NGUYEN, H., AND HADDAWY, P. 1999. The decision-theoretic interactive video advisor. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [65]
    PAPADIMITRIOU, C. 1994. Computational Complexity. Addison-Wesley, Reading, Mass.
    [66]
    PAPADIMITRIOU, C., AND TSITSIKLIS, J. 1986. Intractable problems in control theory. SIAM J. Cont. Optim. 24, 4, 639-654.
    [67]
    PAPADIMITRIOU, C., AND TSITSIKLIS, J. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441-450.
    [68]
    PAPADIMITRIOU, C., AND YANNAKAKIS, M. 1986. A note on succinct representations of graphs. Inf. Cont. 71, 181-185.
    [69]
    PARR, R., AND RUSSELL, S. 1995. Approximating optimal policies for partially observable stochastic domains. In Proceedings of IJCAI-95.
    [70]
    PESHKIN, L., MEULEAU, N., AND KAELBLING, L.P. 1999. Learning policies with external memory. In Proceedings of the 16th International Conference on Machine Learning.
    [71]
    PLATZMAN, L. 1977. Finite-memory estimation and control of finite probabilistic systems. Ph.D. dissertation. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
    [72]
    PORTINALE, L., AND BOBBIO, A. 1999. Bayesian networks for dependability analysis: an application to digital control reliability. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [73]
    PUTERMAN, M. 1994. Markov Decision Processes. Wiley, New York.
    [74]
    PYEATT, L. 1999. Integration of Partially Observable Markov Decision Processes and Reinforcement Learning for Simulated Robot Navigation. Ph.D. dissertation. Colorado State Univ.
    [75]
    SHATKAY, H. 1999. Learning hidden Markov models with geometrical constraints. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [76]
    SIMMONS, R., AND KOENIG, S. 1995. Probabilistic robot navigation in partially observable environments. In Proceedings of IJCAI-95.
    [77]
    SMALLWOOD, R., AND SONDIK, E. 1973. The optimal control of partially observed Markov processes over the finite horizon. Oper. Res. 21, 1071-1088.
    [78]
    SONDIK, E. 1971. The optimal control of partially observable Markov processes. Ph.D. dissertation. Stanford Univ., Stanford, Calif.
    [79]
    STREIBEL, C. 1965. Sufficient statistics in the optimal control of stochastic systems. J. Math. Anal, Appl. 12, 576-592.
    [80]
    SUTTON, R. S., AND BARTO, A.G. 1998. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, Mass.
    [81]
    TODA, S. 1991. PP is as hard as the polynomial-time hierarchy. SIAM J. Comput. 20, 865-877.
    [82]
    TOR~.N, J. 1991. Complexity classes defined by counting quantifiers. J. ACM 38, 3, 753-774.
    [83]
    TSENG, P. 1990. Solving h-horizon, stationary Markov decision problems in time proportional to log h. Oper. Res. Lett. 9, 5 (Sept.), 287-297.
    [84]
    VAN DER GAAG, L., RENOOIJ, S., WITTEMAN, C., ALEMAN, B., AND TALL, B. 1999. How to elicit many probabilities. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif., pp. 647-654.
    [85]
    VINAY, V. 1991. Counting auxiliary pushdown automata and semi-unbounded arithmetic circuits. In Proceedings of the 6th Structure in Complexity Theory Conference. IEEE Computer Society Press, Los Alamitos, Calif., pp. 270-284.
    [86]
    WAGNER, K. 1986. The complexity of combinatorial problems with succinct input representation. Acta. Inf. 23, 325-356.
    [87]
    WELCH, R. L., AND SMOTH, C. 1999. A process control algorithm for concentrating mixed-waste based on Bayesian cg-networks. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif.
    [88]
    WHITE, C., III. 1991. Partially observed Markov decision processes: A survey. Ann. Oper. Res. 32, 215-230.
    [89]
    ZHANG, N., AND LIU, W. 1997. A model approximation scheme for planning in partially observable stochastic domains. J. Artif. Int. Res. 7, 199-230.
    [90]
    ZHANG, N. L., LEE, S. S., AND ZHANG, W. 1999. A method for speeding up value iteration in partially observable Markov decision processes. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufman, San Francisco, Calif., pp. 696-703.

    Cited By

    View all
    • (2024)Multi-agent reinforcement learning based optimal energy sensing threshold control in distributed cognitive radio networks with directional antennaICT Express10.1016/j.icte.2024.01.00110:3(472-478)Online publication date: Jun-2024
    • (2024)Play it safe or leave the comfort zone? Optimal content strategies for social media influencers on streaming video platformsDecision Support Systems10.1016/j.dss.2023.114148179(114148)Online publication date: Apr-2024
    • (2024)Controlling weighted voting games by deleting or adding players with or without changing the quotaAnnals of Mathematics and Artificial Intelligence10.1007/s10472-023-09874-x92:3(631-669)Online publication date: 1-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM  Volume 47, Issue 4
    July 2000
    238 pages
    ISSN:0004-5411
    EISSN:1557-735X
    DOI:10.1145/347476
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 2000
    Published in JACM Volume 47, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Markov decision processes
    2. NP
    3. NPPP
    4. PL
    5. PSPACE
    6. computational complexity
    7. partially observable Markov decision processes
    8. succinct representations

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)240
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-agent reinforcement learning based optimal energy sensing threshold control in distributed cognitive radio networks with directional antennaICT Express10.1016/j.icte.2024.01.00110:3(472-478)Online publication date: Jun-2024
    • (2024)Play it safe or leave the comfort zone? Optimal content strategies for social media influencers on streaming video platformsDecision Support Systems10.1016/j.dss.2023.114148179(114148)Online publication date: Apr-2024
    • (2024)Controlling weighted voting games by deleting or adding players with or without changing the quotaAnnals of Mathematics and Artificial Intelligence10.1007/s10472-023-09874-x92:3(631-669)Online publication date: 1-Jun-2024
    • (2024)On the computational complexity of ethics: moral tractability for minds and machinesArtificial Intelligence Review10.1007/s10462-024-10732-357:4Online publication date: 31-Mar-2024
    • (2024)Artificial virtuous agents in a multi-agent tragedy of the commonsAI & Society10.1007/s00146-022-01569-x39:3(855-872)Online publication date: 1-Jun-2024
    • (2023)Learning in online MDPsProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625862(293-302)Online publication date: 31-Jul-2023
    • (2023)Provably efficient representation learning with tractable planning in low-rank POMDPProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618888(11967-11997)Online publication date: 23-Jul-2023
    • (2023)Game Interactive Learning: A New Paradigm towards Intelligent Decision-MakingCAAI Artificial Intelligence Research10.26599/AIR.2023.9150027(9150027)Online publication date: Dec-2023
    • (2023)Optimistic MLE: A Generic Model-Based Algorithm for Partially Observable Sequential Decision MakingProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585161(363-376)Online publication date: 2-Jun-2023
    • (2023)Approximability and efficient algorithms for constrained fixed-horizon POMDPs with durative actionsArtificial Intelligence10.1016/j.artint.2023.103968323:COnline publication date: 1-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media