Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.24963/ijcai.2023/31guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Towards a better understanding of learning with multiagent teams

Published: 19 August 2023 Publication History

Abstract

While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.

References

[1]
Jose A Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, and Sepp Hochreiter. Rudder: Return decomposition for delayed rewards. NeurIPS, 32, 2019.
[2]
Dilip Arumugam, Peter Henderson, and Pierre-Luc Bacon. An information-theoretic perspective on credit assignment in reinforcement learning. Workshop on Biological and Artificial Reinforcement Learning (NeurIPS 2020), 2020.
[3]
Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. Emergent tool use from multi-agent autocurricula. ICLR, 2019.
[4]
Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R McKee, Joel Z Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative AI. arXiv preprint arXiv:2012.08630, 2020.
[5]
Heng Dong, Tonghan Wang, Jiayuan Liu, Chi Han, and Chongjie Zhang. Birds of a feather flock together: A close look at cooperation emergence via multi-agent rl. arXiv preprint arXiv:2104.11455, 2021.
[6]
Ishan Durugkar, E. Liebman, and P. Stone. Balancing individual preferences and shared objectives in multiagent reinforcement learning. In IJCAI, 2020.
[7]
Ian Gemp, Kevin R McKee, Richard Everett, Edgar A Duéñez-Guzmán, Yoram Bachrach, David Balduzzi, and Andrea Tacchetti. D3C: Reducing the price of anarchy in multi-agent learning. AAMAS, 2022.
[8]
Nico Gürtler, Dieter Büchler, and Georg Martius. Hierarchical reinforcement learning with timed subgoals. NeurIPS, 34:21732-21743, 2021.
[9]
George Dimitri Konidaris and Andrew G Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pages 895-900, 2007.
[10]
Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. AAMAS, 2017.
[11]
John E. Mathieu, Michelle A. Marks, and Stephen J. Zaccaro. Multiteam systems. Handbook of Industrial, Work, and Ordanizational Psychology, 2, 2001.
[12]
Laëtitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In IROS, pages 64-69. IEEE, 2007.
[13]
Kevin R. McKee, Ian Gemp, Brian McWilliams, Edgar A. Duéñez-Guzmán, Edward Hughes, and Joel Z. Leibo. Social diversity and social preferences in mixed-motive reinforcement learning. AAMAS, 2020.
[14]
Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. NeurIPS, 31, 2018.
[15]
Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278-287, 1999.
[16]
Gregory Palmer, Rahul Savani, and Karl Tuyls. Negative update intervals in deep multi-agent reinforcement learning. AAMAS, 2019.
[17]
Thomy Phan, Fabian Ritz, Lenz Belzner, Philipp Altmann, Thomas Gabor, and Claudia Linnhoff-Popien. VAST: Value function factorization with variable agent sub-teams. NeurIPS, 34:24018-24032, 2021.
[18]
Martha E Pollack. A model of plan inference that distinguishes between the beliefs of actors and observers. New York, pages 207-214, 1986.
[19]
Martha E. Pollack. Plans as complex mental attitudes. In Intentions in Communication, pages 77-103. MIT Press, 1990.
[20]
David Radke and Kyle Tilbury. Learning to learn group alignment: A self-tuning credo framework with multiagent teams. ALA Workshop at AAMAS, 2023.
[21]
David Radke, Kate Larson, and Tim Brecht. Exploring the benefits of teams in multiagent learning. In IJCAI, 2022.
[22]
David Radke, Kate Larson, and Tim Brecht. The importance of credo in multiagent learning. AAMAS, 2023.
[23]
Anatol Rapoport. Prisoner's dilemma--recollections and observations. In Game Theory as a Theory of a Conflict Resolution, pages 17-34. Springer, 1974.
[24]
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder De Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234- 7284, 2020.
[25]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, 2017.
[26]
Claude Elwood Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379-423, 1948.
[27]
Lloyd S Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095-1100, 1953.
[28]
Peter Stone, Gal Kaminka, Sarit Kraus, and Jeffrey Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 1504-1509, 2010.
[29]
Joseph Suarez, Yilun Du, Phillip Isola, and Igor Mordatch. Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784, 2019.
[30]
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
[31]
Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181-211, 1999.
[32]
Milind Tambe. Towards flexible teamwork. Journal of Artificial Intelligence Research, 7:83-124, 1997.
[33]
Eugene Vinitsky, Natasha Jaques, Joel Leibo, Antonio Castenada, and Edward Hughes. An open source implementation of sequential social dilemma games. https://github.com/eugenevinitsky/sequential_social_dilemma_games/issues/182, 2019. Accessed: 2022- 07-01.
[34]
Xin Wang, Yudong Chen, and Wenwu Zhu. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
[35]
Eric Wiewiora, Garrison W Cottrell, and Charles Elkan. Principled methods for advising reinforcement learning agents. In ICML, pages 792-799, 2003.
[36]
Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, and Hongyuan Zha. Learning to incentivize other learning agents. NeurIPS, 33:15208-15219, 2020.

Index Terms

  1. Towards a better understanding of learning with multiagent teams
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        IJCAI '23: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
        August 2023
        7242 pages
        ISBN:978-1-956792-03-4

        Sponsors

        • International Joint Conferences on Artifical Intelligence (IJCAI)

        Publisher

        Unknown publishers

        Publication History

        Published: 19 August 2023

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 12 Sep 2024

        Other Metrics

        Citations

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media