Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2980539.2980737guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Multiagent planning with factored MDPs

Published: 03 January 2001 Publication History
  • Get Citation Alerts
  • Abstract

    We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.

    References

    [1]
    U. Bertele and F. Brioschi. Nonserial Dynamic Programming. Academic Press, 1972.
    [2]
    C. Boutilier, T. Dean, and S. Hanks. Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1 - 94, 1999.
    [3]
    D.P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. submitted to the IEEE Transactions on Automatic Control, January 2001.
    [4]
    T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142-150, 1989.
    [5]
    R. Dechter. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113(1-2):41-85, 1999.
    [6]
    C. Guestrin, D. Koller, and R. Parr. Max-norm projections for factored MDPs. In Proc. 17th IJCAI, 2001.
    [7]
    F. Jensen, F. Jensen, and S. Dittmer. From influence diagrams to junction trees. In Uncertainty in Artificial Intelligence: Proceedings of the Tenth Conference, pages 367-373, Seattle, Washington, July 1994. Morgan Kaufmann.
    [8]
    D. Koller and R. Parr. Computing factored value functions for policies in structured MDPs. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99). Morgan Kaufmann, 1999.
    [9]
    D. Koller and R. Parr. Policy iteration for factored MDPs. In Proc. 16th UAI, 2000.
    [10]
    L. Peshkin, N. Meuleau, K. Kim, and L. Kaelbling. Learning to cooperate via policy search. In Proc. 16th UAI, 2000.
    [11]
    J. Schneider, W. Wong, A. Moore, and M. Riedmiller. Distributed value functions. In Proc. 16th ICML, 1999.
    [12]
    P. Schweitzer and A. Seidmann. Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110:568 - 582, 1985.
    [13]
    D. Wolpert, K. Wheller, and K. Tumer. General principles of learning-based multi-agent systems. In Proc. 3rd Agents Conference, 1999.

    Cited By

    View all
    • (2022)A Simulation Based Online Planning Algorithm for Multi-Agent Cooperative EnvironmentsProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3536078(1690-1692)Online publication date: 9-May-2022
    • (2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
    • (2021)Collaborative Multiagent Decision Making for Lane-Free Autonomous DrivingProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464106(1335-1343)Online publication date: 3-May-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'01: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic
    January 2001
    1594 pages

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    Published: 03 January 2001

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Simulation Based Online Planning Algorithm for Multi-Agent Cooperative EnvironmentsProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3536078(1690-1692)Online publication date: 9-May-2022
    • (2021)Scalable Optimization for Wind Farm Control using Coordination GraphsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464109(1362-1370)Online publication date: 3-May-2021
    • (2021)Collaborative Multiagent Decision Making for Lane-Free Autonomous DrivingProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464106(1335-1343)Online publication date: 3-May-2021
    • (2021)Deep Implicit Coordination Graphs for Multi-agent Reinforcement LearningProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464044(764-772)Online publication date: 3-May-2021
    • (2021)Scalable Anytime Planning for Multi-Agent MDPsProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3463997(341-349)Online publication date: 3-May-2021
    • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
    • (2020)Towards minimax optimal reinforcement learning in factored Markov decision processesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497394(19896-19907)Online publication date: 6-Dec-2020
    • (2020)Reinforcement learning in factored MDPsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497254(18226-18236)Online publication date: 6-Dec-2020
    • (2019)Multi-agent common knowledge reinforcement learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455177(9927-9939)Online publication date: 8-Dec-2019
    • (2019)MAVENProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454971(7613-7624)Online publication date: 8-Dec-2019
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media