In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (MDPs) and partially observable MDPs (POMDPs). We then outline a novel algorithm for solving POMDPs off line and show how, in some cases, a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a discussion of the complexity of finding exact solutions to POMDPs and of some possibilities for finding approximate solutions.
Cited By
- Schmidhuber J (2015). Deep learning in neural networks, Neural Networks, 61:C, (85-117), Online publication date: 1-Jan-2015.
- Paquet S, Tobin L and Chaib-draa B An online POMDP algorithm used by the policeforce agents in the robocuprescue simulation RoboCup 2005, (196-207)
- Dini D, Lent M, Carpenter P and Iyer K Building robust planning and execution systems for virtual worlds Proceedings of the Second AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, (29-35)
- Youngblood G, Cook D and Holder L (2018). Managing Adaptive Versatile environments, Pervasive and Mobile Computing, 1:4, (373-403), Online publication date: 1-Dec-2005.
- Paquet S, Tobin L and Chaib-draa B An online POMDP algorithm for complex multiagent environments Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, (970-977)
- Yang Q and Cheng H Mining Plans for Customer-Class Transformation Proceedings of the Third IEEE International Conference on Data Mining
- McMahan H, Gordon G and Blum A Planning in the presence of cost functions controlled by an adversary Proceedings of the Twentieth International Conference on International Conference on Machine Learning, (536-543)
- Hansen E Solving POMDPs by searching in policy space Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, (211-219)
- Schmidhuber J, Zhao J and Wiering M (2019). Shifting Inductive Bias with Success-Story Algorithm, AdaptiveLevin Search, and Incremental Self-Improvement, Machine Language, 28:1, (105-130), Online publication date: 1-Jul-1997.
- Goldsmith J, Littman M and Mundhenk M The complexity of plan existence and evaluation in robabilistic domains Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, (182-189)
- Poole D A framework for decision-theoretic planning I Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence, (436-445)
Recommendations
Planning and acting in partially observable stochastic domains
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable ...
Limiting Discounted-Cost Control of Partially Observable Stochastic Systems
This paper presents two main results on partially observable (PO) stochastic systems. In the first one, we consider a general PO system $$ x_{t+1}= F (x_t, a_t, xi_t), , y_t= G(x_t, eta_t) (t=0,1,ldots) hspace{1in} (*) $$ on Borel spaces, with possibly ...