Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2886521.2886598guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Scalable planning and learning for multiagent POMDPs

Published: 25 January 2015 Publication History

Abstract

Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.

References

[1]
Amato, C., and Oliehoek, F. A. 2013. Bayesian reinforcement learning for multiagent systems with state uncertainty. In Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains, 76-83.
[2]
Amato, C.; Chowdhary, G.; Geramifard, A.; Ure, N. K.; and Kochenderfer, M. J. 2013. Decentralized control of partially observable Markov decision processes. In CDC, 2398-2405.
[3]
Amin, K.; Kearns, M.; and Syed, U. 2011. Graphical models for bandit problems. In UAI, 1-10.
[4]
Browne, C.; Powley, E. J.; Whitehouse, D.; Lucas, S. M.; Cowling, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; and Colton, S. 2012. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intellig. and AI in Games 4(1):1-43.
[5]
Couëtoux, A.; Hoock, J.-B.; Sokolovska, N.; Teytaud, O.; and Bonnard, N. 2011. Continuous upper confidence trees. In Learning and Intelligent Optimization. 433-445.
[6]
Coulom, R. 2007. Computing elo ratings of move patterns in the game of go. International Computer Games Association (ICGA) Journal 30(4):198-208.
[7]
Dibangoye, J. S.; Amato, C.; Buffet, O.; and Charpillet, F. 2014. Exploiting separability in multi-agent planning with continuous-state MDPs. In AAMAS.
[8]
Fairbank, M., and Alonso, E. 2012. The divergence of reinforcement learning algorithms with value-iteration and function approximation. In International Joint Conference on Neural Networks, 1-8. IEEE.
[9]
Farinelli, A.; Rogers, A.; Petcu, A.; and Jennings, N. R. 2008. Decentralised coordination of low-power embedded devices using the max-sum algorithm. In AAMAS.
[10]
Gmytrasiewicz, P. J., and Doshi, P. 2005. A framework for sequential planning in multi-agent settings. JAIR 24.
[11]
Guestrin, C.; Koller, D.; and Parr, R. 2001. Multiagent planning with factored MDPs. In NIPS, 15.
[12]
Guestrin, C.; Lagoudakis, M.; and Parr, R. 2002. Coordinated reinforcement learning. In ICML, 227-234.
[13]
Kearns, M.; Mansour, Y.; and Ng, A. Y. 2002. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. MLJ 49(2-3).
[14]
Kok, J. R., and Vlassis, N. 2006. Collaborative multiagent reinforcement learning by payoff propagation. JMLR 7.
[15]
Kumar, A., and Zilberstein, S. 2009. Constraint-based dynamic programming for decentralized POMDPs with structured interactions. In AAMAS.
[16]
Messias, J. V.; Spaan, M.; and Lima, P. U. 2011. Efficient offline communication policies for factored multiagent POMDPs. In NIPS 24.
[17]
Nair, R.; Varakantham, P.; Tambe, M.; and Yokoo, M. 2005. Networked distributed POMDPs: a synthesis of distributed constraint optimization and POMDPs. In AAAI.
[18]
Ng, B.; Boakye, K.; Meyers, C.; and Wang, A. 2012. Bayes-adaptive interactive POMDPs. In AAAI.
[19]
Oliehoek, F. A.; Spaan, M. T. J.; Whiteson, S.; and Vlassis, N. 2008. Exploiting locality of interaction in factored Dec-POMDPs. In AAMAS.
[20]
Oliehoek, F. A.; Spaan, M. T. J.; and Vlassis, N. 2008. Optimal and approximate Q-value functions for decentralized POMDPs. JAIR 32:289-353.
[21]
Oliehoek, F. A.; Whiteson, S.; and Spaan, M. T. J. 2012. Exploiting structure in cooperative Bayesian games. In UAI, 654-664.
[22]
Oliehoek, F. A.; Whiteson, S.; and Spaan, M. T. J. 2013. Approximate solutions for factored Dec-POMDPs with many agents. In AAMAS, 563-570.
[23]
Oliehoek, F. A. 2012. Decentralized POMDPs. In Wiering, M., and van Otterlo, M., eds., Reinforcement Learning: State of the Art. Springer.
[24]
Pajarinen, J., and Peltonen, J. 2011. Efficient planning for factored infinite-horizon DEC-POMDPs. In IJCAI, 325-331.
[25]
Pynadath, D. V., and Tambe, M. 2002. The communicative multiagent team decision problem: Analyzing teamwork theories and models. JAIR 16.
[26]
Ross, S.; Pineau, J.; Paquet, S.; and Chaib-draa, B. 2008. Online planning algorithms for POMDPs. JAIR 32(1).
[27]
Ross, S.; Pineau, J.; Chaib-draa, B.; and Kreitmann, P. 2011. A Bayesian approach for learning and planning in partially observable Markov decision processes. JAIR 12.
[28]
Silver, D., and Veness, J. 2010. Monte-carlo planning in large POMDPs. In NIPS 23.
[29]
Silver, D.; Sutton, R. S.; and Müller, M. 2012. Temporal-difference search in computer Go. MLJ 87(2):183-219.
[30]
Stone, P., and Sutton, R. S. 2001. Scaling reinforcement learning toward RoboCup soccer. In ICML, 537-544.
[31]
Tesauro, G. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38(3):58-68.

Cited By

View all
  • (2019)Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent PoliciesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332044(2162-2164)Online publication date: 8-May-2019
  • (2019)The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement LearningProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331944(1862-1864)Online publication date: 8-May-2019
  • (2019)Distributed adaptive-neighborhood control for stochastic reachability in multi-agent systemsProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297370(914-921)Online publication date: 8-Apr-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
January 2015
4331 pages
ISBN:0262511290

Sponsors

  • Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 25 January 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent PoliciesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332044(2162-2164)Online publication date: 8-May-2019
  • (2019)The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement LearningProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331944(1862-1864)Online publication date: 8-May-2019
  • (2019)Distributed adaptive-neighborhood control for stochastic reachability in multi-agent systemsProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297370(914-921)Online publication date: 8-Apr-2019
  • (2018)Interactive learning and decision makingProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304829(5703-5708)Online publication date: 13-Jul-2018
  • (2018)Please be an Influencer?Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237912(1423-1431)Online publication date: 9-Jul-2018
  • (2018)Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function ApproximationProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237491(730-738)Online publication date: 9-Jul-2018
  • (2017)Learning in POMDPs with Monte Carlo tree searchProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305381.3305569(1819-1827)Online publication date: 6-Aug-2017
  • (2017)Can bounded and self-interested agents be teammates? Application to planning in ad hoc teamsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9354-431:4(821-860)Online publication date: 1-Jul-2017
  • (2016)Exploiting anonymity in approximate linear programmingProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016255(2537-2573)Online publication date: 12-Feb-2016
  • (2016)A Value Equivalence Approach for Solving Interactive Dynamic Influence DiagramsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937094(1162-1170)Online publication date: 9-May-2016
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media