Article

Multiagent planning with factored MDPs

Authors:

NIPS'01: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic

Pages 1523 - 1530

Published: 03 January 2001 Publication History

Publisher Site

Abstract

We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.

References

[1]

U. Bertele and F. Brioschi. Nonserial Dynamic Programming. Academic Press, 1972.

Crossref

Google Scholar

[2]

C. Boutilier, T. Dean, and S. Hanks. Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1 - 94, 1999.

Google Scholar

[3]

D.P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. submitted to the IEEE Transactions on Automatic Control, January 2001.

Google Scholar

[4]

T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142-150, 1989.

Crossref

Google Scholar

[5]

R. Dechter. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113(1-2):41-85, 1999.

Crossref

Google Scholar

[6]

C. Guestrin, D. Koller, and R. Parr. Max-norm projections for factored MDPs. In Proc. 17th IJCAI, 2001.

Crossref

Google Scholar

[7]

F. Jensen, F. Jensen, and S. Dittmer. From influence diagrams to junction trees. In Uncertainty in Artificial Intelligence: Proceedings of the Tenth Conference, pages 367-373, Seattle, Washington, July 1994. Morgan Kaufmann.

Crossref

Google Scholar

[8]

D. Koller and R. Parr. Computing factored value functions for policies in structured MDPs. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99). Morgan Kaufmann, 1999.

Crossref

Google Scholar

[9]

D. Koller and R. Parr. Policy iteration for factored MDPs. In Proc. 16th UAI, 2000.

Crossref

Google Scholar

[10]

L. Peshkin, N. Meuleau, K. Kim, and L. Kaelbling. Learning to cooperate via policy search. In Proc. 16th UAI, 2000.

Crossref

Google Scholar

[11]

J. Schneider, W. Wong, A. Moore, and M. Riedmiller. Distributed value functions. In Proc. 16th ICML, 1999.

Crossref

Google Scholar

[12]

P. Schweitzer and A. Seidmann. Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110:568 - 582, 1985.

Google Scholar

[13]

D. Wolpert, K. Wheller, and K. Tumer. General principles of learning-based multi-agent systems. In Proc. 3rd Agents Conference, 1999.

Crossref

Google Scholar

Cited By

View all

Mahmud RFaisal FMahmud SKhan MPelachaud CTaylor MFaliszewski PMascardi V(2022)A Simulation Based Online Planning Algorithm for Multi-Agent Cooperative EnvironmentsProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3536078(1690-1692)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3536078
Verstraeten TDaems PBargiacchi ERoijers DLibin PHelsen JDignum FLomuscio AEndriss UNowé A(2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464109
Troullinos DChalkiadakis GPapamichail IPapageorgiou MDignum FLomuscio AEndriss UNowé A(2021)Collaborative Multiagent Decision Making for Lane-Free Autonomous DrivingProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464106(1335-1343)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464106
Show More Cited By

Index Terms

Multiagent planning with factored MDPs

Recommendations

Exploiting separability in multiagent planning with continuous-state MDPs
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems

Recent years have seen significant advances in techniques for optimally solving multiagent problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). A new method achieves scalability gains by converting Dec-...
Efficient planning for factored infinite-horizon DEC-POMDPs
IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other ...
Context-specific multiagent coordination and planning with factored MDPs
Eighteenth national conference on Artificial intelligence

We present an algorithm for coordinated decision making in cooperative multiagent settings, where the agents' value function canbe represented as a sum of context-specific <i>value rules</i>. The task of finding an optimal joint action in this setting ...

Comments

Information & Contributors

Information

Published In

NIPS'01: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic

January 2001

1594 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 03 January 2001

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Mahmud RFaisal FMahmud SKhan MPelachaud CTaylor MFaliszewski PMascardi V(2022)A Simulation Based Online Planning Algorithm for Multi-Agent Cooperative EnvironmentsProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3536078(1690-1692)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3536078
Verstraeten TDaems PBargiacchi ERoijers DLibin PHelsen JDignum FLomuscio AEndriss UNowé A(2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464109
Troullinos DChalkiadakis GPapamichail IPapageorgiou MDignum FLomuscio AEndriss UNowé A(2021)Collaborative Multiagent Decision Making for Lane-Free Autonomous DrivingProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464106(1335-1343)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464106
Li SGupta JMorales PAllen RKochenderfer MDignum FLomuscio AEndriss UNowé A(2021)Deep Implicit Coordination Graphs for Multi-agent Reinforcement LearningProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464044(764-772)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464044
Choudhury SGupta JMorales PKochenderfer MDignum FLomuscio AEndriss UNowé A(2021)Scalable Anytime Planning for Multi-Agent MDPsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3463997(341-349)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3463997
Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Tian YQian JSra SLarochelle HRanzato MHadsell RBalcan MLin H(2020)Towards minimax optimal reinforcement learning in factored Markov decision processesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497394(19896-19907)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497394
Xu ZTewari ALarochelle HRanzato MHadsell RBalcan MLin H(2020)Reinforcement learning in factored MDPsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497254(18226-18236)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497254
de Witt CFoerster JFarquhar GTorr PBöhmer WWhiteson SWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Multi-agent common knowledge reinforcement learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455177(9927-9939)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3455177
Mahajan ARashid TSamvelyan MWhiteson SWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)MAVENProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454971(7613-7624)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454971
Show More Cited By

Abstract

References

Cited By

Index Terms

Recommendations

Exploiting separability in multiagent planning with continuous-state MDPs

Efficient planning for factored infinite-horizon DEC-POMDPs

Context-specific multiagent coordination and planning with factored MDPs

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations