Article

Scalable planning and learning for multiagent POMDPs

Authors:

Christopher Amato,

Frans A. OliehoekAuthors Info & Claims

AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

Pages 1995 - 2002

Published: 25 January 2015 Publication History

Abstract

Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.

References

[1]

Amato, C., and Oliehoek, F. A. 2013. Bayesian reinforcement learning for multiagent systems with state uncertainty. In Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains, 76-83.

[2]

Amato, C.; Chowdhary, G.; Geramifard, A.; Ure, N. K.; and Kochenderfer, M. J. 2013. Decentralized control of partially observable Markov decision processes. In CDC, 2398-2405.

[3]

Amin, K.; Kearns, M.; and Syed, U. 2011. Graphical models for bandit problems. In UAI, 1-10.

[4]

Browne, C.; Powley, E. J.; Whitehouse, D.; Lucas, S. M.; Cowling, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; and Colton, S. 2012. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intellig. and AI in Games 4(1):1-43.

[5]

Couëtoux, A.; Hoock, J.-B.; Sokolovska, N.; Teytaud, O.; and Bonnard, N. 2011. Continuous upper confidence trees. In Learning and Intelligent Optimization. 433-445.

[6]

Coulom, R. 2007. Computing elo ratings of move patterns in the game of go. International Computer Games Association (ICGA) Journal 30(4):198-208.

[7]

Dibangoye, J. S.; Amato, C.; Buffet, O.; and Charpillet, F. 2014. Exploiting separability in multi-agent planning with continuous-state MDPs. In AAMAS.

[8]

Fairbank, M., and Alonso, E. 2012. The divergence of reinforcement learning algorithms with value-iteration and function approximation. In International Joint Conference on Neural Networks, 1-8. IEEE.

[9]

Farinelli, A.; Rogers, A.; Petcu, A.; and Jennings, N. R. 2008. Decentralised coordination of low-power embedded devices using the max-sum algorithm. In AAMAS.

[10]

Gmytrasiewicz, P. J., and Doshi, P. 2005. A framework for sequential planning in multi-agent settings. JAIR 24.

[11]

Guestrin, C.; Koller, D.; and Parr, R. 2001. Multiagent planning with factored MDPs. In NIPS, 15.

[12]

Guestrin, C.; Lagoudakis, M.; and Parr, R. 2002. Coordinated reinforcement learning. In ICML, 227-234.

[13]

Kearns, M.; Mansour, Y.; and Ng, A. Y. 2002. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. MLJ 49(2-3).

[14]

Kok, J. R., and Vlassis, N. 2006. Collaborative multiagent reinforcement learning by payoff propagation. JMLR 7.

[15]

Kumar, A., and Zilberstein, S. 2009. Constraint-based dynamic programming for decentralized POMDPs with structured interactions. In AAMAS.

[16]

Messias, J. V.; Spaan, M.; and Lima, P. U. 2011. Efficient offline communication policies for factored multiagent POMDPs. In NIPS 24.

[17]

Nair, R.; Varakantham, P.; Tambe, M.; and Yokoo, M. 2005. Networked distributed POMDPs: a synthesis of distributed constraint optimization and POMDPs. In AAAI.

[18]

Ng, B.; Boakye, K.; Meyers, C.; and Wang, A. 2012. Bayes-adaptive interactive POMDPs. In AAAI.

[19]

Oliehoek, F. A.; Spaan, M. T. J.; Whiteson, S.; and Vlassis, N. 2008. Exploiting locality of interaction in factored Dec-POMDPs. In AAMAS.

[20]

Oliehoek, F. A.; Spaan, M. T. J.; and Vlassis, N. 2008. Optimal and approximate Q-value functions for decentralized POMDPs. JAIR 32:289-353.

[21]

Oliehoek, F. A.; Whiteson, S.; and Spaan, M. T. J. 2012. Exploiting structure in cooperative Bayesian games. In UAI, 654-664.

[22]

Oliehoek, F. A.; Whiteson, S.; and Spaan, M. T. J. 2013. Approximate solutions for factored Dec-POMDPs with many agents. In AAMAS, 563-570.

[23]

Oliehoek, F. A. 2012. Decentralized POMDPs. In Wiering, M., and van Otterlo, M., eds., Reinforcement Learning: State of the Art. Springer.

[24]

Pajarinen, J., and Peltonen, J. 2011. Efficient planning for factored infinite-horizon DEC-POMDPs. In IJCAI, 325-331.

[25]

Pynadath, D. V., and Tambe, M. 2002. The communicative multiagent team decision problem: Analyzing teamwork theories and models. JAIR 16.

[26]

Ross, S.; Pineau, J.; Paquet, S.; and Chaib-draa, B. 2008. Online planning algorithms for POMDPs. JAIR 32(1).

[27]

Ross, S.; Pineau, J.; Chaib-draa, B.; and Kreitmann, P. 2011. A Bayesian approach for learning and planning in partially observable Markov decision processes. JAIR 12.

[28]

Silver, D., and Veness, J. 2010. Monte-carlo planning in large POMDPs. In NIPS 23.

[29]

Silver, D.; Sutton, R. S.; and Müller, M. 2012. Temporal-difference search in computer Go. MLJ 87(2):183-219.

[30]

Stone, P., and Sutton, R. S. 2001. Scaling reinforcement learning toward RoboCup soccer. In ICML, 537-544.

[31]

Tesauro, G. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38(3):58-68.

Cited By

Phan TSchmid KBelzner LGabor TFeld SLinnhoff-Popien CElkind EVeloso MAgmon NTaylor M(2019)Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent PoliciesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332044(2162-2164)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332044
Castellini JOliehoek FSavani RWhiteson SElkind EVeloso MAgmon NTaylor M(2019)The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement LearningProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331944(1862-1864)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331944
Lukina ATiwari ASmolka SGrosu RHung CPapadopoulos G(2019)Distributed adaptive-neighborhood control for stochastic reachability in multi-agent systemsProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297370(914-921)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297370
Show More Cited By

Recommendations

Planning with macro-actions in decentralized POMDPs
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are ...
Efficient planning for factored infinite-horizon DEC-POMDPs
IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other ...
Continual planning and acting in dynamic multiagent environments

In order to behave intelligently, artificial agents must be able to deliberatively plan their future actions. Unfortunately, realistic agent environments are usually highly dynamic and only partially observable, which makes planning computationally ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

January 2015

4331 pages

ISBN:0262511290

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 25 January 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Phan TSchmid KBelzner LGabor TFeld SLinnhoff-Popien CElkind EVeloso MAgmon NTaylor M(2019)Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent PoliciesProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332044(2162-2164)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332044
Castellini JOliehoek FSavani RWhiteson SElkind EVeloso MAgmon NTaylor M(2019)The Representational Capacity of Action-Value Networks for Multi-Agent Reinforcement LearningProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331944(1862-1864)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331944
Lukina ATiwari ASmolka SGrosu RHung CPapadopoulos G(2019)Distributed adaptive-neighborhood control for stochastic reachability in multi-agent systemsProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297370(914-921)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297370
Oliehoek F(2018)Interactive learning and decision makingProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304829(5703-5708)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304829
Yadav ANoothigattu RRice EOnasch-Vera LSoriano Marcolino LTambe MAndre EKoenig SDastani MSukthankar G(2018)Please be an Influencer?Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237912(1423-1431)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237912
Phan TBelzner LGabor TSchmid KAndre EKoenig SDastani MSukthankar G(2018)Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function ApproximationProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237491(730-738)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237491
Katt SOliehoek FAmato C(2017)Learning in POMDPs with Monte Carlo tree searchProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305381.3305569(1819-1827)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305381.3305569
Chandrasekaran MDoshi PZeng YChen Y(2017)Can bounded and self-interested agents be teammates? Application to planning in ad hoc teamsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9354-431:4(821-860)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1007/s10458-016-9354-4
Robbel POliehoek FKochenderfer M(2016)Exploiting anonymity in approximate linear programmingProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016255(2537-2573)Online publication date: 12-Feb-2016
https://dl.acm.org/doi/10.5555/3016100.3016255
Conroy RZeng YCavazza MTang JPan YJonker CMarsella SThangarajah JTuyls K(2016)A Value Equivalence Approach for Solving Interactive Dynamic Influence DiagramsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937094(1162-1170)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2937094
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents