Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Monte Carlo sampling methods for approximating interactive POMDPs

Published: 01 March 2009 Publication History

Abstract

Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent's belief about the physical world, about beliefs of other agents, and about their beliefs about others' beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity - sometimes also called the curse of history - we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse's impact substantially. We provide experimental results and chart future work.

References

[1]
Alon, N., & Spencer, J. (2000). The Probabilistic Method. John Wiley and Sons.
[2]
Aumann, R. J. (1999). Interactive epistemology i: Knowledge. International Journal of Game Theory, 28, 263-300.
[3]
Battigalli, P., & Siniscalchi, M. (1999). Hierarchies of conditional beliefs and interactive epistemology in dynamic games. Journal of Economic Theory, 88(1), 188-230.
[4]
Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of markov decision processes. Mathematics of Operations Research, 27(4), 819-840.
[5]
Bertsekas, D. (1995). Dynamic Programming and optimal control. Athena Scientific.
[6]
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.
[7]
Brandenburger, A., & Dekel, E. (1993). Hierarchies of beliefs and common knowledge. Journal of Economic Theory, 59, 189-198.
[8]
Casbeer, D., Beard, R., McLain, T., Sai-Ming, L., & Mehra, R. (2005). Forest fire monitoring with multiple small uavs. In American Control Conference, pp. 3530-3535.
[9]
Collins, M., Dasgupta, S., & R.E.Schapire (2002). A generalization of principal component analysis to the exponential family. In Neural Information Processing Systems (NIPS), pp. 617-624.
[10]
Crisan, D., & Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on Signal Processing, 50(3), 736-746.
[11]
Daum, F., & Huang, J. (2002). Mysterious computational complexity of particle filters. In Conference on Signal and Data Processing of Small Targets, SPIE Proceedings Series, pp. 418-426, Orlando, FL.
[12]
Dobson, A. (2002). An Introduction to Generalized Linear Models, 3rd Ed. Chapman and Hall.
[13]
Doshi, P. (2007). Improved state estimation in multiagent settings with continuous or large dscrete state spaces. In Twenty Second Conference on Artificial Intelligence (AAAI), pp. 712-717.
[14]
Doshi, P., & Gmytrasiewicz, P. J. (2005a). Approximating state estimation in multiagent settings using particle filters. In Autonomous Agents and Multi-agent Systems Conference (AAMAS), pp. 320-327.
[15]
Doshi, P., & Gmytrasiewicz, P. J. (2005b). A particle filtering based approach to approximating interactive pomdps. In Twentieth National Conference on Artificial Intelligence (AAAI), pp. 969-974.
[16]
Doshi, P., & Perez, D. (2008). Generalized point based value iteration for interactive pomdps. In Twenty Third Conference on Artificial Intelligence (AAAI), pp. 63-68.
[17]
Doshi, P., Zeng, Y., & Chen, Q. (2007). Graphical models for online solutions to interactive pomdps. In Autonomous Agents and Multiagent Systems Conference (AAMAS), pp. 809-816, Honolulu, Hawaii.
[18]
Doucet, A., de Freitas, N., Murphy, K., & Russell, S. (2000). Rao-blackwellised particle filtering for dynamic bayesian networks. In Uncertainty in Artificial Intelligence (UAI), pp. 176-183.
[19]
Doucet, A., Freitas, N. D., & Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Springer Verlag.
[20]
Fagin, R., Halpern, J., Moses, Y., & Vardi, M. (1995). Reasoning about Knowledge. MIT Press.
[21]
Fox, D., Burgard, W., Kruppa, H., & Thrun, S. (2000). A probabilistic approach to collaborative multi-robot localization. Autonomous Robots on Heterogenous Multi-Robot Systems, 8(3), 325-344.
[22]
Fudenberg, D., & Levine, D. K. (1998). The Theory of Learning in Games. MIT Press.
[23]
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian Data Analysis, Second Edition. Chapman and Hall/CRC.
[24]
Geweke, J. (1989). Bayesian inference in econometric models using monte carlo integration. Econometrica, 57, 1317-1339.
[25]
Gmytrasiewicz, P., & Doshi, P. (2005). A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research, 24, 49-79.
[26]
Gordon, N., Salmond, D., & Smith, A. (1993). Novel approach to non-linear/non-gaussian bayesian state estimation. IEEE Proceedings-F, 140(2), 107-113.
[27]
Harsanyi, J. C. (1967). Games with incomplete information played by bayesian players. Management Science, 14(3), 159-182.
[28]
Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika, 57, 97-109.
[29]
Heifetz, A., & Samet, D. (1998). Topology-free typology of beliefs. Journal of Economic Theory, 82, 324-341.
[30]
Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134.
[31]
Kearns, M., Mansour, Y., & Ng, A. (2002). A sparse sampling algorithm for near-optimal planning in large markov decision processes. Machine Learning, 49, 193-208.
[32]
Koller, D., & Lerner, U. (2001). Sampling in factored dynamic systems. In Doucet, A., Freitas, N. D., & Gordon, N. (Eds.), Sequential Monte Carlo Methods in Practice. Springer.
[33]
Kramer, S. C., & Sorenson, H. (1988). Recursive bayesian estimation using piecewise constant approximations. Automatica, 24, 789-801.
[34]
Li, M., & Vitanyi, P. (1997). An Introduction to Kolmogorov Complexity and its Applications. Springer.
[35]
Mertens, J., & Zamir, S. (1985). Formulation of bayesian analysis for games with incomplete information. International Journal of Game Theory, 14, 1-29.
[36]
Murphy, D., & Cycon, J. (1998). Applications for mini vtol uav for law enforcement. In SPIE 3577:Sensors, C3I, Information, and Training Technologies for Law Enforcement.
[37]
Ortiz, L., & Kaelbling, L. (2000). Sampling methods for action selection in influence diagrams. In Seventeenth National Conference on Artificial Intelligence (AAAI), pp. 378-385, Austin, TX.
[38]
Paquet, S., Tobin, L., & Chaib-draa, B. (2005). An online pomdp algorithm for complex multiagent environments. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 970-977, Utrecht, Netherlands.
[39]
Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based value iteration for large pomdps. Journal of Artificial Intelligence Research, 27, 335-380.
[40]
Poupart, P., & Boutilier, C. (2003). Value-directed compression in pomdps. In Neural Information Processing Systems (NIPS), pp. 1547-1554.
[41]
Poupart, P., & Boutilier, C. (2004). Vdcbpi: An approximate algorithm scalable for large-scale pomdps. In Neural Information Processing Systems (NIPS), pp. 1081-1088.
[42]
Poupart, P., Ortiz, L., & Boutilier, C. (2001). Value-directed sampling methods for belief monitoring in pomdps. In Uncertainty in Artificial Intelligence (UAI), pp. 453-461.
[43]
Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for pomdps. Journal of Artificial Intelligence Research (JAIR), 32, 663-704.
[44]
Roy, N., Gordon, G., & Thrun, S. (2005). Finding approximate pomdp solutions through belief compression. Journal of Artificial Intelligence Research (JAIR), 23, 1-40.
[45]
Russell, S., & Norvig, P. (2003). Artificial Intelligence: A Modern Approach (Second Edition). Prentice Hall.
[46]
Saad, Y. (1996). Iterative Methods for Sparse Linear Systems. PWS, Boston.
[47]
Sarris, Z. (2001). Survey of uav applications in civil markets. In IEEE Mediterranean Conference on Control and Automation, p. 11.
[48]
Schmidt, J. P., Siegel, A., & Srinivasan, A. (1995). Chernoff-hoeffding bounds for applications with limited independence. SIAM Journal on Discrete Mathematics, 8(2), 223-250.
[49]
Seuken, S., & Zilberstein, S. (2007). Improved memory bounded dynamic programming for decentralized pomdps. In Uncertainty in Artificial Intelligence (UAI), pp. 2009-2015.
[50]
Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multiagent Systems, 17(2), 190-250.
[51]
Smallwood, R., & Sondik, E. (1973). The optimal control of partially observable markov decision processes over a finite horizon. Operations Research, 21, 1071-1088.
[52]
Sorenson, H. W., & Alspach, D. L. (1971). Recursive bayesian estimation using gaussian sums. Automatica, 7, 465-479.
[53]
Sorenson, H. W. (Ed.). (1985). Kalman Filtering: Theory and Application. IEEE Press, New York.
[54]
Thrun, S. (2000). Monte carlo pomdps. In Neural Information Processing Systems (NIPS), pp. 1064-1070.
[55]
Tsitsiklis, J., & Roy, B. V. (1996). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.
[56]
Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for online reward optimization. In International Conference on Machine Learning (ICML), pp. 956- 963.

Cited By

View all
  • (2024)Efficient adaptation in mixed-motive environments via hierarchical opponent modeling and planningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692875(20004-20022)Online publication date: 21-Jul-2024
  • (2024)Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663074(2094-2103)Online publication date: 6-May-2024
  • (2024)Higher Order Reasoning under Intent Uncertainty Reinforces the Hobbesian TrapProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662962(1066-1074)Online publication date: 6-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Artificial Intelligence Research
Journal of Artificial Intelligence Research  Volume 34, Issue 1
January 2009
812 pages

Publisher

AI Access Foundation

El Segundo, CA, United States

Publication History

Published: 01 March 2009
Received: 01 June 2008
Published in JAIR Volume 34, Issue 1

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient adaptation in mixed-motive environments via hierarchical opponent modeling and planningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692875(20004-20022)Online publication date: 21-Jul-2024
  • (2024)Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663074(2094-2103)Online publication date: 6-May-2024
  • (2024)Higher Order Reasoning under Intent Uncertainty Reinforces the Hobbesian TrapProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662962(1066-1074)Online publication date: 6-May-2024
  • (2024)Neural amortized inference for nested multi-agent reasoningProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i1.27808(530-537)Online publication date: 20-Feb-2024
  • (2019)Optimal Sequential Planning for Communicative ActionsProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331985(1985-1987)Online publication date: 8-May-2019
  • (2019)Dynamic Particle Allocation to Solve Interactive POMDP Models for Social Decision MakingProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331755(674-682)Online publication date: 8-May-2019
  • (2019)IPOMDP-netProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33016062(6062-6069)Online publication date: 27-Jan-2019
  • (2018)Learning others' intentional models in multi-agent settings using interactive POMDPsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327466(5639-5647)Online publication date: 3-Dec-2018
  • (2017)Interactive POMDPs with finite-state models of other agentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9359-z31:4(861-904)Online publication date: 1-Jul-2017
  • (2016)Affect control processesArtificial Intelligence10.1016/j.artint.2015.09.004230:C(134-172)Online publication date: 1-Jan-2016
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media