Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/375735.376302acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
Article

Hierarchical multi-agent reinforcement learning

Published: 28 May 2001 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primitive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy.

    References

    [1]
    R. Askin and C. Standridge. Modeling and Analysis of Manufacturing Systems. John Wiley and Sons, 1993.
    [2]
    T. Balch and R. Arkin. Behavior-based formation control for multi-robot teams. IEEE Transactions on Robotics and Automation, 14(6):1-15, 1998.
    [3]
    R. Crites and A. Barto. Elevator group control using multiple reinforcement learning agents. Machine Learning, 33:235-262, 1998.
    [4]
    T. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, volume 13. pages 227-303, 2000.
    [5]
    J. Hu and M. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Fifteenth International Conference on Machine Learning, pages 242-250, 1998.
    [6]
    C. Klein and J. Kim. Agv dispatching. International Journal of Production Research, 34(1):95-110, 1996.
    [7]
    J. Lee. Composite dispatching rules for multiple-vehicle AGV systems. SIMULATION, 66(2):121-130, 1996.
    [8]
    M. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157-163, 1994.
    [9]
    M. Mataric. Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1):73-83, 1997.
    [10]
    R. Parr. Hierarchical Control and Learning for Markov Decision Processes. PhD Thesis, University of California, Berkeley, 1998.
    [11]
    M. L. Puterman. Markov Decision Processes. Wiley Interscience, New York, USA, 1994.
    [12]
    P. Stone and M. Veloso. Team-partitioned, opaque-transition reinforcement learning. Third International Conference onAutonomous Agents, pages 86-91, 1999.
    [13]
    T. Sugawara and V. Lesser. Learning to improve coordinated actions in cooperative distributed problem-solving environments. Machine Learning, 33:129-154, 1998.
    [14]
    R. Sutton and A. Barto. An Introduction to Reinforcement Learning. MIT Press, Cambridge, MA., 1998.
    [15]
    R. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211, 1999.
    [16]
    P. Tadepalli and D. Ok. Scaling up average reward reinforcement learning by approximating the domain models and the value function. In Proceedings of International Machine Learning Conference, 1996.
    [17]
    M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, pages 330-337, 1993.
    [18]
    G. Wang and S. Mahadevan. Hierarchical optimization of policy-coupled semi-markov decision processes. In Proceedings of the Sixteenth International Conference on Machine Learning, 1999.
    [19]
    G. Weiss. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge, MA., 1999.

    Cited By

    View all
    • (2023)Feudal Latent Space Exploration for Coordinated Multi-Agent Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314620134:10(7775-7783)Online publication date: Oct-2023
    • (2023)Reinforcement Learning for Multiaircraft Autonomous Air Combat in Multisensor UCAV PlatformIEEE Sensors Journal10.1109/JSEN.2022.322032423:18(20596-20606)Online publication date: 15-Sep-2023
    • (2023)Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS55552.2023.10342281(7348-7353)Online publication date: 1-Oct-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AGENTS '01: Proceedings of the fifth international conference on Autonomous agents
    May 2001
    662 pages
    ISBN:158113326X
    DOI:10.1145/375735
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 May 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    AGENTS01
    AGENTS01: Autonomous Agents 2001
    Quebec, Montreal, Canada

    Acceptance Rates

    AGENTS '01 Paper Acceptance Rate 66 of 248 submissions, 27%;
    Overall Acceptance Rate 182 of 599 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)166
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Feudal Latent Space Exploration for Coordinated Multi-Agent Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314620134:10(7775-7783)Online publication date: Oct-2023
    • (2023)Reinforcement Learning for Multiaircraft Autonomous Air Combat in Multisensor UCAV PlatformIEEE Sensors Journal10.1109/JSEN.2022.322032423:18(20596-20606)Online publication date: 15-Sep-2023
    • (2023)Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS55552.2023.10342281(7348-7353)Online publication date: 1-Oct-2023
    • (2023)Deep reinforcement learning in recommender systems: A survey and new perspectivesKnowledge-Based Systems10.1016/j.knosys.2023.110335264(110335)Online publication date: Mar-2023
    • (2023)Smart mobile robot fleet management based on hierarchical multi-agent deep Q network towards intelligent manufacturingEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106534124(106534)Online publication date: Oct-2023
    • (2023)Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement LearningInternational Journal of Control, Automation and Systems10.1007/s12555-022-0171-z21:7(2350-2362)Online publication date: 5-May-2023
    • (2023)Reinforcement Learning for Multi-Agent Stochastic Resource CollectionMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26412-2_13(200-215)Online publication date: 17-Mar-2023
    • (2022)E-MAPPProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601153(12154-12168)Online publication date: 28-Nov-2022
    • (2022)An In-Depth Analysis of Cooperative Multi-Robot Hierarchical Reinforcement LearningProceedings of the 7th International Conference on Sustainable Information Engineering and Technology10.1145/3568231.3568258(119-126)Online publication date: 22-Nov-2022
    • (2022)Hierarchical Multiagent Reinforcement Learning for Allocating Guaranteed Display AdsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.307048433:10(5361-5373)Online publication date: Oct-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media