Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Publisher:
  • University of Toronto
  • Computer Center Toronto, Ont. M5S 1A1
  • Canada
ISBN:978-0-494-02727-1
Order Number:AAINR02727
Pages:
144
Reflects downloads up to 14 Oct 2024Bibliometrics
Skip Abstract Section
Abstract

Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: Large policy spaces and large state spaces.

On the other hand, for many real-world POMDPs it is possible to define effective policies with simple rules of thumb. This suggests that we may be able to find small policies that are near optimal. This thesis first presents a Bounded Policy Iteration (BPI) algorithm to robustly find a good policy represented by a small finite state controller. Real-world POMDPs also tend to exhibit structural properties that can be exploited to mitigate the effect of large state spaces. To that effect, a value-directed compression (VDC) technique is also presented to reduce POMDP models to lower dimensional representations.

In practice, it is critical to simultaneously mitigate the impact of complex policy representations and large state spaces. Hence, this thesis describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDC with Perseus (a randomized point-based value iteration algorithm by Spaan and Vlassis [136]), and state abstraction with Perseus. The scalability of those approaches is demonstrated on two problems with more than 33 million states: synthetic network management and a real-world system designed to assist elderly persons with cognitive deficiencies to carry out simple daily tasks such as hand-washing. This represents an important step towards the deployment of POMDP techniques in ever larger, real-world, sequential decision making problems.

Cited By

  1. ACM
    Garcia L, Samin H and Bencomo N (2024). Decision Making for Self-Adaptation Based on Partially Observable Satisfaction of Non-Functional Requirements, ACM Transactions on Autonomous and Adaptive Systems, 19:2, (1-44), Online publication date: 30-Jun-2024.
  2. ACM
    Veiga T and Renoux J (2023). From Reactive to Active Sensing: A Survey on Information Gathering in Decision-theoretic Planning, ACM Computing Surveys, 55:13s, (1-22), Online publication date: 31-Dec-2024.
  3. Oliehoek F, Witwicki S and Kaelbling L (2021). A Sufficient Statistic for Influence in Structured Multiagent Environments, Journal of Artificial Intelligence Research, 70, (789-870), Online publication date: 1-May-2021.
  4. Wray K and Zilberstein S Generalized Controllers in POMDP Decision-Making 2019 International Conference on Robotics and Automation (ICRA), (7166-7172)
  5. ACM
    Péron M, Bartlett P, Becker K, Helmstedt K and Chadès I Two Approximate Dynamic Programming Algorithms for Managing Complete SIS Networks Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, (1-10)
  6. ACM
    Paucar L and Bencomo N RE-STORM Proceedings of the 13th International Conference on Software Engineering for Adaptive and Self-Managing Systems, (19-25)
  7. (2017). Employing decomposable partially observable Markov decision processes to control gene regulatory networks, Artificial Intelligence in Medicine, 83:C, (14-34), Online publication date: 1-Nov-2017.
  8. Péron M, Becker K, Bartlett P and Chadès I Fast-tracking stationary MOMDPs for adaptive management problems Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (4531-4537)
  9. Capitan J, Merino L and Ollero A (2016). Cooperative Decision-Making Under Uncertainties for Multi-Target Surveillance with Multiples UAVs, Journal of Intelligent and Robotic Systems, 84:1-4, (371-386), Online publication date: 1-Dec-2016.
  10. Irissappane A, Zhang J, Oliehoek F and Dutta P Secure routing in wireless sensor networks via POMDPs Proceedings of the 24th International Conference on Artificial Intelligence, (2617-2623)
  11. Lang J and Zanuttini B Probabilistic knowledge-based programs Proceedings of the 24th International Conference on Artificial Intelligence, (1594-1600)
  12. Panella A and Gmytrasiewicz P Nonparametric Bayesian Learning of Other Agents? Policies in Interactive POMDPs Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (1875-1876)
  13. Veiga T, Spaan M and Lima P Improving Value Function Approximation in Factored POMDPs by Exploiting Model Structure Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (1827-1828)
  14. Mota P, Melo F and Coheur L Modeling Students Self-Studies Behaviors Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (1521-1528)
  15. Grzes M and Poupart P Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (1249-1257)
  16. Irissappane A, Oliehoek F and Zhang J A POMDP based approach to optimally select sellers in electronic marketplaces Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, (1329-1336)
  17. Grzes M and Poupart P POMDP planning and execution in an augmented space Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, (757-764)
  18. ACM
    Folsom-Kovarik J, Sukthankar G and Schatz S (2013). Tractable POMDP representations for intelligent tutoring systems, ACM Transactions on Intelligent Systems and Technology, 4:2, (1-22), Online publication date: 1-Mar-2013.
  19. ACM
    Yu L and Brooks R Applying POMDP to moving target optimization Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop, (1-4)
  20. ACM
    Hoey J, Boutilier C, Poupart P, Olivier P, Monk A and Mihailidis A (2013). People, sensors, decisions, ACM Transactions on Interactive Intelligent Systems, 2:4, (1-36), Online publication date: 1-Dec-2012.
  21. Zhang Z and Chen X FHHOP Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, (934-943)
  22. Crook P, Wang Z, Liu X and Lemon O A statistical spoken dialogue system using complex user goals and value directed compression Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, (46-50)
  23. Panella A and Gmytrasiewicz P A partition-based first-order probabilistic logic to represent interactive beliefs Proceedings of the 5th international conference on Scalable uncertainty management, (233-246)
  24. ACM
    Koltunova V, Hoey J and Grześ M Goal-oriented sensor selection for intelligent phones Proceedings of the 2011 international workshop on Situation activity & goal awareness, (83-88)
  25. Poupart P, Lang T and Toussaint M Analyzing and escaping local optima in planning as inference for partially observable domains Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II, (613-628)
  26. Poupart P, Lang T and Toussaint M Analyzing and escaping local optima in planning as inference for partially observable domains Proceedings of the 2011th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II, (613-628)
  27. Erdogdu U, Alhajj R and Polat F The Benefit of Decomposing POMDP for Control of Gene Regulatory Networks Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02, (381-385)
  28. Varakantham P, Schurr N, Carlin A and Amato C Decision Support in Organizations Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02, (163-170)
  29. Hoet S and Sabouret N Reinforcement Learning of Communication in a Multi-agent Context Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02, (240-243)
  30. Poupart P, Lang T and Toussaint M Escaping local optima in POMDP planning as inference The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3, (1263-1264)
  31. Varakantham P, Schurr N, Carlin A and Amato C Adaptive decision support for structured organizations The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3, (1149-1150)
  32. Crook P and Lemon O Representing uncertainty about complex user goals in statistical dialogue systems Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, (209-212)
  33. Pajarinen J, Peltonen J, Hottinen A and Uusitalo M Efficient planning in large POMDPs through policy graph based factorized approximations Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III, (1-16)
  34. Pajarinen J, Peltonen J, Hottinen A and Uusitalo M Efficient planning in large POMDPs through policy graph based factorized approximations Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III, (1-16)
  35. Pajarinen J, Peltonen J, Hottinen A and Uusitalo M Efficient planning in large POMDPs through policy graph based factorized approximations Proceedings of the 2010th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III, (1-16)
  36. Lison P Towards relational POMDPs for adaptive dialogue management Proceedings of the ACL 2010 Student Research Workshop, (7-12)
  37. Hajishirzi H, Shirazi A, Choi J and Amir E Greedy algorithms for sequential sensing decisions Proceedings of the 21st International Joint Conference on Artificial Intelligence, (1908-1915)
  38. ACM
    Vlassis N and Toussaint M Model-free reinforcement learning as mixture learning Proceedings of the 26th Annual International Conference on Machine Learning, (1081-1088)
  39. ACM
    Blunsden S, Richards B, Boger J, Mihailidis A, Bartindale T, Jackson D, Olivier P and Hoey J Design and prototype of a device to engage cognitively disabled older adults in visual artwork Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments, (1-8)
  40. Shani G, Poupart P, Brafman R and Shimony S Efficient ADD operations for point-based algorithms Proceedings of the Eighteenth International Conference on International Conference on Automated Planning and Scheduling, (330-337)
  41. Sim H, Kim K, Kim J, Chang D and Koo M Symbolic heuristic search value iteration for factored POMDPs Proceedings of the 23rd national conference on Artificial intelligence - Volume 2, (1088-1093)
  42. Isom J, Meyn S and Braatz R Piecewise linear dynamic programming for constrained POMDPs Proceedings of the 23rd national conference on Artificial intelligence - Volume 1, (291-296)
  43. Ross S, Pineau J, Paquet S and Chaib-draa B (2008). Online planning algorithms for POMDPs, Journal of Artificial Intelligence Research, 32:1, (663-704), Online publication date: 1-May-2008.
  44. Itoh H and Nakamura K (2007). Partially observable Markov decision processes with imprecise parameters, Artificial Intelligence, 171:8-9, (453-490), Online publication date: 1-Jun-2007.
  45. Ross S and Chaib-Draa B AEMS Proceedings of the 20th international joint conference on Artifical intelligence, (2592-2598)
  46. Mausam , Bertoli P and Weld D A hybridized planner for stochastic domains Proceedings of the 20th international joint conference on Artifical intelligence, (1972-1978)
  47. Wang T, Poupart P, Bowling M and Schuurmans D Compact, convex upper bound iteration for approximate POMDP planning proceedings of the 21st national conference on Artificial intelligence - Volume 2, (1245-1251)
  48. Paquet S, Tobin L and Chaib-draa B An online POMDP algorithm used by the policeforce agents in the robocuprescue simulation RoboCup 2005, (196-207)
  49. Boger J, Poupart P, Hoey J, Boutilier C, Fernie G and Mihailidis A A decision-theoretic approach to task assistance for persons with dementia Proceedings of the 19th international joint conference on Artificial intelligence, (1293-1299)
  50. ACM
    Paquet S, Tobin L and Chaib-draa B An online POMDP algorithm for complex multiagent environments Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, (970-977)
  51. Spaan M and Vlassis N (2005). Perseus, Journal of Artificial Intelligence Research, 24:1, (195-220), Online publication date: 1-Jul-2005.
  52. Paquet S, Tobin L and Chaib-draa B Real-Time decision making for large POMDPs Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence, (450-455)
Contributors
  • David R. Cheriton School of Computer Science

Recommendations