Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces

Authors

  • Xiaotian Hao College of Intelligence and Computing, Tianjin University
  • Jianye Hao College of Intelligence and Computing, Tianjin University Noah’s Ark Lab, Huawei
  • Chenjun Xiao Noah’s Ark Lab, Huawei
  • Kai Li Noah’s Ark Lab, Huawei
  • Dong Li Noah’s Ark Lab, Huawei
  • Yan Zheng College of Intelligence and Computing, Tianjin University

DOI:

https://doi.org/10.1609/aaai.v38i11.29121

Keywords:

ML: Reinforcement Learning, MAS: Coordination and Collaboration, SO: Sampling/Simulation-based Search

Abstract

AlphaZero and MuZero have achieved state-of-the-art (SOTA) performance in a wide range of domains, including board games and robotics, with discrete and continuous action spaces. However, to obtain an improved policy, they often require an excessively large number of simulations, especially for domains with large action spaces. As the simulation budget decreases, their performance drops significantly. In addition, many important real-world applications have combinatorial (or exponential) action spaces, making it infeasible to search directly over all possible actions. In this paper, we extend AlphaZero and MuZero to learn and plan in more complex multiagent (MA) Markov decision processes, where the action spaces increase exponentially with the number of agents. Our new algorithms, MA Gumbel AlphaZero and MA Gumbel MuZero, respectively without and with model learning, achieve superior performance on cooperative multiagent control problems, while reducing the number of environmental interactions by up to an order of magnitude compared to model-free approaches. In particular, we significantly improve prior performance when planning with much fewer simulation budgets. The code and appendix are available at https://github.com/tjuHaoXiaotian/MA-MuZero.

Published

2024-03-24

How to Cite

Hao, X., Hao, J., Xiao, C., Li, K., Li, D., & Zheng, Y. (2024). Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12304-12312. https://doi.org/10.1609/aaai.v38i11.29121

Issue

Section

AAAI Technical Track on Machine Learning II