research-article

Towards a better understanding of learning with multiagent teams

AUTHORs:

Kyle TilburyAuthors Info & Claims

IJCAI '23: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Article No.: 31, Pages 271 - 279

https://doi.org/10.24963/ijcai.2023/31

Published: 19 August 2023 Publication History

Abstract

While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.

References

[1]

Jose A Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, and Sepp Hochreiter. Rudder: Return decomposition for delayed rewards. NeurIPS, 32, 2019.

[2]

Dilip Arumugam, Peter Henderson, and Pierre-Luc Bacon. An information-theoretic perspective on credit assignment in reinforcement learning. Workshop on Biological and Artificial Reinforcement Learning (NeurIPS 2020), 2020.

[3]

Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. Emergent tool use from multi-agent autocurricula. ICLR, 2019.

[4]

Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R McKee, Joel Z Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative AI. arXiv preprint arXiv:2012.08630, 2020.

[5]

Heng Dong, Tonghan Wang, Jiayuan Liu, Chi Han, and Chongjie Zhang. Birds of a feather flock together: A close look at cooperation emergence via multi-agent rl. arXiv preprint arXiv:2104.11455, 2021.

[6]

Ishan Durugkar, E. Liebman, and P. Stone. Balancing individual preferences and shared objectives in multiagent reinforcement learning. In IJCAI, 2020.

[7]

Ian Gemp, Kevin R McKee, Richard Everett, Edgar A Duéñez-Guzmán, Yoram Bachrach, David Balduzzi, and Andrea Tacchetti. D3C: Reducing the price of anarchy in multi-agent learning. AAMAS, 2022.

[8]

Nico Gürtler, Dieter Büchler, and Georg Martius. Hierarchical reinforcement learning with timed subgoals. NeurIPS, 34:21732-21743, 2021.

[9]

George Dimitri Konidaris and Andrew G Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pages 895-900, 2007.

Digital Library

[10]

Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. Multi-agent reinforcement learning in sequential social dilemmas. AAMAS, 2017.

[11]

John E. Mathieu, Michelle A. Marks, and Stephen J. Zaccaro. Multiteam systems. Handbook of Industrial, Work, and Ordanizational Psychology, 2, 2001.

[12]

Laëtitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In IROS, pages 64-69. IEEE, 2007.

[13]

Kevin R. McKee, Ian Gemp, Brian McWilliams, Edgar A. Duéñez-Guzmán, Edward Hughes, and Joel Z. Leibo. Social diversity and social preferences in mixed-motive reinforcement learning. AAMAS, 2020.

[14]

Ofir Nachum, Shixiang Shane Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. NeurIPS, 31, 2018.

[15]

Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278-287, 1999.

Digital Library

[16]

Gregory Palmer, Rahul Savani, and Karl Tuyls. Negative update intervals in deep multi-agent reinforcement learning. AAMAS, 2019.

[17]

Thomy Phan, Fabian Ritz, Lenz Belzner, Philipp Altmann, Thomas Gabor, and Claudia Linnhoff-Popien. VAST: Value function factorization with variable agent sub-teams. NeurIPS, 34:24018-24032, 2021.

[18]

Martha E Pollack. A model of plan inference that distinguishes between the beliefs of actors and observers. New York, pages 207-214, 1986.

[19]

Martha E. Pollack. Plans as complex mental attitudes. In Intentions in Communication, pages 77-103. MIT Press, 1990.

[20]

David Radke and Kyle Tilbury. Learning to learn group alignment: A self-tuning credo framework with multiagent teams. ALA Workshop at AAMAS, 2023.

[21]

David Radke, Kate Larson, and Tim Brecht. Exploring the benefits of teams in multiagent learning. In IJCAI, 2022.

[22]

David Radke, Kate Larson, and Tim Brecht. The importance of credo in multiagent learning. AAMAS, 2023.

[23]

Anatol Rapoport. Prisoner's dilemma--recollections and observations. In Game Theory as a Theory of a Conflict Resolution, pages 17-34. Springer, 1974.

[24]

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder De Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234- 7284, 2020.

Digital Library

[25]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, 2017.

[26]

Claude Elwood Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379-423, 1948.

[27]

Lloyd S Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095-1100, 1953.

[28]

Peter Stone, Gal Kaminka, Sarit Kraus, and Jeffrey Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 1504-1509, 2010.

[29]

Joseph Suarez, Yilun Du, Phillip Isola, and Igor Mordatch. Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784, 2019.

[30]

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.

Digital Library

[31]

Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181-211, 1999.

Digital Library

[32]

Milind Tambe. Towards flexible teamwork. Journal of Artificial Intelligence Research, 7:83-124, 1997.

[33]

Eugene Vinitsky, Natasha Jaques, Joel Leibo, Antonio Castenada, and Edward Hughes. An open source implementation of sequential social dilemma games. https://github.com/eugenevinitsky/sequential_social_dilemma_games/issues/182, 2019. Accessed: 2022- 07-01.

[34]

Xin Wang, Yudong Chen, and Wenwu Zhu. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.

[35]

Eric Wiewiora, Garrison W Cottrell, and Charles Elkan. Principled methods for advising reinforcement learning agents. In ICML, pages 792-799, 2003.

Digital Library

[36]

Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, and Hongyuan Zha. Learning to incentivize other learning agents. NeurIPS, 33:15208-15219, 2020.

Index Terms

Towards a better understanding of learning with multiagent teams
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Cooperation and coordination
      2. Multi-agent systems
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Toward a Myers-Briggs type indicator model of agent behavior in multiagent teams
MABS'10: Proceedings of the 11th international conference on Multi-agent-based simulation

This paper explores the use of the Myers-Briggs Type Indicator (MBTI) as the basis for defining the personality of an agent. The MBTI is a well-known psychological theory of human personality. In the MBTI model, four axes are defined to explain how ...
Multiagent Reinforcement Social Learning toward Coordination in Cooperative Multiagent Systems

Most previous works on coordination in cooperative multiagent systems study the problem of how two (or more) players can coordinate on Pareto-optimal Nash equilibrium(s) through fixed and repeated interactions in the context of cooperative games. ...
Towards incremental social learning in optimization and multiagent systems
GECCO '08: Proceedings of the 10th annual conference companion on Genetic and evolutionary computation

Social learning is a mechanism that allows individuals to acquire knowledge from others without incurring the costs of acquiring it individually. Individuals that learn socially can thus spend their time and energy exploiting their knowledge or learning ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI '23: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

August 2023

7242 pages

ISBN:978-1-956792-03-4

Editor:
Edith Elkind

Copyright © 2023 International Joint Conferences on Artificial Intelligence.

Sponsors

International Joint Conferences on Artifical Intelligence (IJCAI)

Publisher

Unknown publishers

Publication History

Published: 19 August 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents