article

Monte Carlo sampling methods for approximating interactive POMDPs

Authors:

Prashant Doshi,

Piotr J. GmytrasiewiczAuthors Info & Claims

Journal of Artificial Intelligence Research, Volume 34, Issue 1

Pages 297 - 337

Published: 01 March 2009 Publication History

Abstract

Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent's belief about the physical world, about beliefs of other agents, and about their beliefs about others' beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity - sometimes also called the curse of history - we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse's impact substantially. We provide experimental results and chart future work.

References

[1]

Alon, N., & Spencer, J. (2000). The Probabilistic Method. John Wiley and Sons.

[2]

Aumann, R. J. (1999). Interactive epistemology i: Knowledge. International Journal of Game Theory, 28, 263-300.

[3]

Battigalli, P., & Siniscalchi, M. (1999). Hierarchies of conditional beliefs and interactive epistemology in dynamic games. Journal of Economic Theory, 88(1), 188-230.

[4]

Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of markov decision processes. Mathematics of Operations Research, 27(4), 819-840.

Digital Library

[5]

Bertsekas, D. (1995). Dynamic Programming and optimal control. Athena Scientific.

Digital Library

[6]

Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.

Digital Library

[7]

Brandenburger, A., & Dekel, E. (1993). Hierarchies of beliefs and common knowledge. Journal of Economic Theory, 59, 189-198.

[8]

Casbeer, D., Beard, R., McLain, T., Sai-Ming, L., & Mehra, R. (2005). Forest fire monitoring with multiple small uavs. In American Control Conference, pp. 3530-3535.

[9]

Collins, M., Dasgupta, S., & R.E.Schapire (2002). A generalization of principal component analysis to the exponential family. In Neural Information Processing Systems (NIPS), pp. 617-624.

[10]

Crisan, D., & Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on Signal Processing, 50(3), 736-746.

Digital Library

[11]

Daum, F., & Huang, J. (2002). Mysterious computational complexity of particle filters. In Conference on Signal and Data Processing of Small Targets, SPIE Proceedings Series, pp. 418-426, Orlando, FL.

[12]

Dobson, A. (2002). An Introduction to Generalized Linear Models, 3rd Ed. Chapman and Hall.

[13]

Doshi, P. (2007). Improved state estimation in multiagent settings with continuous or large dscrete state spaces. In Twenty Second Conference on Artificial Intelligence (AAAI), pp. 712-717.

Digital Library

[14]

Doshi, P., & Gmytrasiewicz, P. J. (2005a). Approximating state estimation in multiagent settings using particle filters. In Autonomous Agents and Multi-agent Systems Conference (AAMAS), pp. 320-327.

Digital Library

[15]

Doshi, P., & Gmytrasiewicz, P. J. (2005b). A particle filtering based approach to approximating interactive pomdps. In Twentieth National Conference on Artificial Intelligence (AAAI), pp. 969-974.

Digital Library

[16]

Doshi, P., & Perez, D. (2008). Generalized point based value iteration for interactive pomdps. In Twenty Third Conference on Artificial Intelligence (AAAI), pp. 63-68.

Digital Library

[17]

Doshi, P., Zeng, Y., & Chen, Q. (2007). Graphical models for online solutions to interactive pomdps. In Autonomous Agents and Multiagent Systems Conference (AAMAS), pp. 809-816, Honolulu, Hawaii.

Digital Library

[18]

Doucet, A., de Freitas, N., Murphy, K., & Russell, S. (2000). Rao-blackwellised particle filtering for dynamic bayesian networks. In Uncertainty in Artificial Intelligence (UAI), pp. 176-183.

Digital Library

[19]

Doucet, A., Freitas, N. D., & Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Springer Verlag.

[20]

Fagin, R., Halpern, J., Moses, Y., & Vardi, M. (1995). Reasoning about Knowledge. MIT Press.

Digital Library

[21]

Fox, D., Burgard, W., Kruppa, H., & Thrun, S. (2000). A probabilistic approach to collaborative multi-robot localization. Autonomous Robots on Heterogenous Multi-Robot Systems, 8(3), 325-344.

Digital Library

[22]

Fudenberg, D., & Levine, D. K. (1998). The Theory of Learning in Games. MIT Press.

[23]

Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian Data Analysis, Second Edition. Chapman and Hall/CRC.

[24]

Geweke, J. (1989). Bayesian inference in econometric models using monte carlo integration. Econometrica, 57, 1317-1339.

[25]

Gmytrasiewicz, P., & Doshi, P. (2005). A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research, 24, 49-79.

Digital Library

[26]

Gordon, N., Salmond, D., & Smith, A. (1993). Novel approach to non-linear/non-gaussian bayesian state estimation. IEEE Proceedings-F, 140(2), 107-113.

[27]

Harsanyi, J. C. (1967). Games with incomplete information played by bayesian players. Management Science, 14(3), 159-182.

Digital Library

[28]

Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57, 97-109.

[29]

Heifetz, A., & Samet, D. (1998). Topology-free typology of beliefs. Journal of Economic Theory, 82, 324-341.

[30]

Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134.

Digital Library

[31]

Kearns, M., Mansour, Y., & Ng, A. (2002). A sparse sampling algorithm for near-optimal planning in large markov decision processes. Machine Learning, 49, 193-208.

Digital Library

[32]

Koller, D., & Lerner, U. (2001). Sampling in factored dynamic systems. In Doucet, A., Freitas, N. D., & Gordon, N. (Eds.), Sequential Monte Carlo Methods in Practice. Springer.

[33]

Kramer, S. C., & Sorenson, H. (1988). Recursive bayesian estimation using piecewise constant approximations. Automatica, 24, 789-801.

Digital Library

[34]

Li, M., & Vitanyi, P. (1997). An Introduction to Kolmogorov Complexity and its Applications. Springer.

Digital Library

[35]

Mertens, J., & Zamir, S. (1985). Formulation of bayesian analysis for games with incomplete information. International Journal of Game Theory, 14, 1-29.

Digital Library

[36]

Murphy, D., & Cycon, J. (1998). Applications for mini vtol uav for law enforcement. In SPIE 3577:Sensors, C3I, Information, and Training Technologies for Law Enforcement.

[37]

Ortiz, L., & Kaelbling, L. (2000). Sampling methods for action selection in influence diagrams. In Seventeenth National Conference on Artificial Intelligence (AAAI), pp. 378-385, Austin, TX.

Digital Library

[38]

Paquet, S., Tobin, L., & Chaib-draa, B. (2005). An online pomdp algorithm for complex multiagent environments. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 970-977, Utrecht, Netherlands.

Digital Library

[39]

Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based value iteration for large pomdps. Journal of Artificial Intelligence Research, 27, 335-380.

Digital Library

[40]

Poupart, P., & Boutilier, C. (2003). Value-directed compression in pomdps. In Neural Information Processing Systems (NIPS), pp. 1547-1554.

[41]

Poupart, P., & Boutilier, C. (2004). Vdcbpi: An approximate algorithm scalable for large-scale pomdps. In Neural Information Processing Systems (NIPS), pp. 1081-1088.

[42]

Poupart, P., Ortiz, L., & Boutilier, C. (2001). Value-directed sampling methods for belief monitoring in pomdps. In Uncertainty in Artificial Intelligence (UAI), pp. 453-461.

Digital Library

[43]

Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for pomdps. Journal of Artificial Intelligence Research (JAIR), 32, 663-704.

Digital Library

[44]

Roy, N., Gordon, G., & Thrun, S. (2005). Finding approximate pomdp solutions through belief compression. Journal of Artificial Intelligence Research (JAIR), 23, 1-40.

Digital Library

[45]

Russell, S., & Norvig, P. (2003). Artificial Intelligence: A Modern Approach (Second Edition). Prentice Hall.

Digital Library

[46]

Saad, Y. (1996). Iterative Methods for Sparse Linear Systems. PWS, Boston.

Digital Library

[47]

Sarris, Z. (2001). Survey of uav applications in civil markets. In IEEE Mediterranean Conference on Control and Automation, p. 11.

[48]

Schmidt, J. P., Siegel, A., & Srinivasan, A. (1995). Chernoff-hoeffding bounds for applications with limited independence. SIAM Journal on Discrete Mathematics, 8(2), 223-250.

Digital Library

[49]

Seuken, S., & Zilberstein, S. (2007). Improved memory bounded dynamic programming for decentralized pomdps. In Uncertainty in Artificial Intelligence (UAI), pp. 2009-2015.

Digital Library

[50]

Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multiagent Systems, 17(2), 190-250.

Digital Library

[51]

Smallwood, R., & Sondik, E. (1973). The optimal control of partially observable markov decision processes over a finite horizon. Operations Research, 21, 1071-1088.

Digital Library

[52]

Sorenson, H. W., & Alspach, D. L. (1971). Recursive bayesian estimation using gaussian sums. Automatica, 7, 465-479.

Digital Library

[53]

Sorenson, H. W. (Ed.). (1985). Kalman Filtering: Theory and Application. IEEE Press, New York.

[54]

Thrun, S. (2000). Monte carlo pomdps. In Neural Information Processing Systems (NIPS), pp. 1064-1070.

[55]

Tsitsiklis, J., & Roy, B. V. (1996). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.

Digital Library

[56]

Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for online reward optimization. In International Conference on Machine Learning (ICML), pp. 956- 963.

Digital Library

Cited By

Huang YLiu AKong FYang YZhu SFeng XSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Efficient adaptation in mixed-motive environments via hierarchical opponent modeling and planningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692875(20004-20022)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692875
Zhi-Xuan TYing LMansinghka VTenenbaum JDastani MSichman JAlechina NDignum V(2024)Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663074(2094-2103)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663074
Kuusela ORoy DDastani MSichman JAlechina NDignum V(2024)Higher Order Reasoning under Intent Uncertainty Reinforces the Hobbesian TrapProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662962(1066-1074)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662962
Show More Cited By

Recommendations

A particle filtering based approach to approximating interactive POMDPs
AAAI'05: Proceedings of the 20th national conference on Artificial intelligence - Volume 2

POMDPs provide a principled framework for sequential planning in single agent settings. An extension of POMDPs to multi agent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which ...
Under-Approximating Expected Total Rewards in POMDPs
Tools and Algorithms for the Construction and Analysis of Systems
Abstract
We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this—generally undecidable—problem by computing under-approximations ...
Bayes-adaptive interactive POMDPs
AAAI'12: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence

We introduce the Bayes-Adaptive Interactive Partially Observable Markov Decision Process (BA-IPOMDP), the first multiagent decision model that explicitly incorporates model learning. As in I-POMDPs, the BA-IPOMDP agent maintains beliefs over interactive ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Artificial Intelligence Research

Journal of Artificial Intelligence Research Volume 34, Issue 1

January 2009

812 pages

ISSN:1076-9757

Issue’s Table of Contents

Publisher

AI Access Foundation

El Segundo, CA, United States

Publication History

Published: 01 March 2009

Received: 01 June 2008

Published in JAIR Volume 34, Issue 1

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang YLiu AKong FYang YZhu SFeng XSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Efficient adaptation in mixed-motive environments via hierarchical opponent modeling and planningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692875(20004-20022)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692875
Zhi-Xuan TYing LMansinghka VTenenbaum JDastani MSichman JAlechina NDignum V(2024)Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663074(2094-2103)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663074
Kuusela ORoy DDastani MSichman JAlechina NDignum V(2024)Higher Order Reasoning under Intent Uncertainty Reinforces the Hobbesian TrapProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662962(1066-1074)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662962
Jha KLe TJin CKuo YTenenbaum JShu TWooldridge MDy JNatarajan S(2024)Neural amortized inference for nested multi-agent reasoningProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i1.27808(530-537)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i1.27808
Gmytrasiewicz PAdhikari SElkind EVeloso MAgmon NTaylor M(2019)Optimal Sequential Planning for Communicative ActionsProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331985(1985-1987)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331985
Vallam RAhuja SSajja SChaudhuri RPimplikar RMukherjee KNarayanam RParija GElkind EVeloso MAgmon NTaylor M(2019)Dynamic Particle Allocation to Solve Interactive POMDP Models for Social Decision MakingProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331755(674-682)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331755
Han YGmytrasiewicz P(2019)IPOMDP-netProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33016062(6062-6069)Online publication date: 27-Jan-2019
https://dl.acm.org/doi/10.1609/aaai.v33i01.33016062
Han YGmytrasiewicz P(2018)Learning others' intentional models in multi-agent settings using interactive POMDPsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327466(5639-5647)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327466
Panella AGmytrasiewicz P(2017)Interactive POMDPs with finite-state models of other agentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9359-z31:4(861-904)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s10458-016-9359-z
Hoey JSchröder TAlhothali A(2016)Affect control processesArtificial Intelligence10.1016/j.artint.2015.09.004230:C(134-172)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1016/j.artint.2015.09.004
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents