Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3545946.3598620acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

TiZero: Mastering Multi-Agent Football with Curriculum Learning and Self-Play

Published: 30 May 2023 Publication History

Abstract

Multi-agent football poses an unsolved challenge in AI research. Existing work has focused on tackling simplified scenarios of the game, or else leveraging expert demonstrations. In this paper, we develop a multi-agent system to play the full 11 vs. 11 game mode, without demonstrations. This game mode contains aspects that present major challenges to modern reinforcement learning algorithms; multi-agent coordination, long-term planning, and non-transitivity. To address these challenges, we present TiZero; a self-evolving, multi-agent system that learns from scratch. TiZero introduces several innovations, including adaptive curriculum learning, a novel self-play strategy, and an objective that optimizes the policies of multiple agents jointly. Experimentally, it outperforms previous systems by a large margin on the Google Research Football environment, increasing win rates by over 30%. To demonstrate the generality of TiZero's innovations, they are assessed on several environments beyond football; Overcooked, Multi-agent Particle-Environment, Tic-Tac-Toe and Connect-Four.

References

[1]
Sarvar Anvarov. 2020. Solution ranked 35th in Kaggle Football Competition. https://github.com/Sarvar-Anvarov/Google-Research-Football.
[2]
Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, and Charles Blundell. 2020. Agent57: Outperforming the atari human benchmark. In International Conference on Machine Learning. PMLR, 507--517.
[3]
David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech Czarnecki, Julien Perolat, Max Jaderberg, and Thore Graepel. 2019. Open-ended learning in symmetric zero-sum games. In International Conference on Machine Learning. PMLR, 434--443.
[4]
Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemyslaw Debiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. 2019. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
[5]
Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. 2002. The complexity of decentralized control of Markov decision processes. Mathematics of operations research, Vol. 27, 4 (2002), 819--840.
[6]
Wenze Chen, Shiyu Huang, Yuan Chiang, Ting Chen, and Jun Zhu. 2022. DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization. arXiv preprint arXiv:2207.05631 (2022).
[7]
Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. 2020. Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning. PMLR, 2048--2056.
[8]
Wojciech M Czarnecki, Gauthier Gidel, Brendan Tracey, Karl Tuyls, Shayegan Omidshafiei, David Balduzzi, and Max Jaderberg. 2020. Real world games look like spinning tops. Advances in Neural Information Processing Systems, Vol. 33 (2020), 17443--17454.
[9]
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2021. First return, then explore. Nature, Vol. 590, 7847 (2021), 580--586.
[10]
Lei Feng, Yuxuan Xie, Bing Liu, and Shuyan Wang. 2022. Multi-Level Credit Assignment for Cooperative Multi-Agent Reinforcement Learning. Applied Sciences, Vol. 12, 14 (2022), 6938.
[11]
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[12]
Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, and Yi Wu. 2022. Revisiting some common practices in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2206.07505 (2022).
[13]
Yiming Gao, Bei Shi, Xueying Du, Liang Wang, Guangwei Chen, Zhenjie Lian, Fuhao Qiu, Guoan Han, Weixuan Wang, Deheng Ye, et al. 2021. Learning Diverse Policies in MOBA Games via Macro-Goals. Advances in Neural Information Processing Systems, Vol. 34 (2021), 16171--16182.
[14]
Google. 2020. Google Research Football Competition 2020. https://www.kaggle.com/c/google-football.
[15]
Yang Guan, Minghuan Liu, Weijun Hong, Weinan Zhang, Fei Fang, Guangjun Zeng, and Yue Lin. 2022. PerfectDou: Dominating DouDizhu with Perfect Information Distillation. arXiv preprint arXiv:2203.16406 (2022).
[16]
Johannes Heinrich, Marc Lanctot, and David Silver. 2015. Fictitious self-play in extensive-form games. In International conference on machine learning. PMLR, 805--813.
[17]
Ralf Herbrich, Tom Minka, and Thore Graepel. 2006. Trueskill?: A Bayesian skill rating system. In Proceedings of the 19th international conference on neural information processing systems. 569--576.
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[19]
Shiyu Huang, Wenze Chen, Longfei Zhang, Ziyang Li, Fengming Zhu, Deheng Ye, Ting Chen, and Jun Zhu. 2021. TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations. arXiv preprint arXiv:2110.04507 (2021).
[20]
Shiyu Huang, Hang Su, Jun Zhu, and Ting Chen. 2019. Combo-Action: Training Agent For FPS Game with Auxiliary Tasks. AAAI.
[21]
JiDi. 2022. JiDi Olympics Football. https://github.com/jidiai/ai_lib/blob/master/env/olympics_football.py.
[22]
Steven Kapturowski, V'ictor Campos, Ray Jiang, Nemanja Rakićević, Hado van Hasselt, Charles Blundell, and Adrià Puigdomènech Badia. 2022. Human-level Atari 200x faster. arXiv preprint arXiv:2209.07550 (2022).
[23]
Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, and Will Dabney. 2018. Recurrent experience replay in distributed reinforcement learning. In International conference on learning representations.
[24]
Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Ja'skowski. 2016. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1--8.
[25]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[26]
Hiroaki Kitano, Minoru Asada, Yasuo Kuniyoshi, Itsuki Noda, and Eiichi Osawa. 1997. Robocup: The robot world cup initiative. In Proceedings of the first international conference on Autonomous agents. 340--347.
[27]
Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in neural information processing systems, Vol. 12 (1999).
[28]
Karol Kurach, Anton Raichuk, Piotr Stanczyk, Michal Zajkac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, et al. 2019. Google research football: A novel reinforcement learning environment. arXiv preprint arXiv:1907.11180 (2019).
[29]
Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. 2017. A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, Vol. 30 (2017).
[30]
Chenghao Li, Chengjie Wu, Tonghan Wang, Jun Yang, Qianchuan Zhao, and Chongjie Zhang. 2021. Celebrating Diversity in Shared Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2106.02195 (2021).
[31]
Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, and Thore Graepel. 2019. Emergent coordination through competition. arXiv preprint arXiv:1902.07151 (2019).
[32]
Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, SM Eslami, Daniel Hennes, Wojciech M Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, et al. 2021b. From Motor Control to Team Play in Simulated Humanoid Football. arXiv preprint arXiv:2105.12196 (2021).
[33]
Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, and Thore Graepel. 2022. NeuPL: Neural Population Learning. arXiv preprint arXiv:2202.07415 (2022).
[34]
Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, and Zhipeng Hu. 2021a. Unifying behavioral and response diversity for open-ended learning in zero-sum games. arXiv preprint arXiv:2106.04958 (2021).
[35]
Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, Vol. 30 (2017).
[36]
Felipe B. Martins, Mateus G. Machado, Hansenclever F. Bassani, Pedro H. M. Braga, and Edna S. Barros. 2021. rSoccer: A Framework for Studying Reinforcement Learning in Small and Very Small Size Robot Soccer. arxiv: 2106.12895 [cs.LG]
[37]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529.
[38]
Andrew Y Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, Vol. 99. 278--287.
[39]
NVIDIA. 2016. NCCL. https://github.com/NVIDIA/nccl.
[40]
Jack Parker-Holder, Aldo Pacchiano, Krzysztof M Choromanski, and Stephen J Roberts. 2020. Effective diversity in population based reinforcement learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 18050--18062.
[41]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019), 8026--8037.
[42]
Tim Pearce and Jun Zhu. 2022. Counter-Strike Deathmatch with Large-Scale Behavioural Cloning. 2022 IEEE Conference on Games (CoG) (2022), 104--111. https://doi.org/10.1109/CoG51982.2022.9893617
[43]
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning. PMLR, 4295--4304.
[44]
Andrew M Saxe, James L McClelland, and Surya Ganguli. 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120 (2013).
[45]
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015).
[46]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[47]
Ruggiero Seccia, Francesco Foglino, Matteo Leonetti, and Simone Sagratella. 2022. A novel optimization perspective to the problem of designing sequences of tasks in a reinforcement learning framework. Optimization and Engineering (2022), 1--16.
[48]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529, 7587 (2016), 484--489.
[49]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017).
[50]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362, 6419 (2018), 1140--1144.
[51]
Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. 2019. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning. PMLR, 5887--5896.
[52]
Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, and Shimon Whiteson. 2022. Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO. arXiv preprint arXiv:2202.00082 (2022).
[53]
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017).
[54]
Adrien Ali Taiga, William Fedus, Marlos C Machado, Aaron Courville, and Marc G Bellemare. 2019. On bonus based exploration methods in the arcade learning environment. In International Conference on Learning Representations.
[55]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354.
[56]
w9PcJLyb. 2020. Solution ranked 15th in Kaggle Football Competition. https://github.com/w9PcJLyb/GFootball.
[57]
Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, and Chongjie Zhang. 2020b. Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020).
[58]
Li Wang, Yupeng Zhang, Yujing Hu, Weixun Wang, Chongjie Zhang, Yang Gao, Jianye Hao, Tangjie Lv, and Changjie Fan. 2022. Individual Reward Assisted Multi-Agent Reinforcement Learning. In International Conference on Machine Learning. PMLR, 23417--23432.
[59]
Rose E Wang, Sarah A Wu, James A Evans, Joshua B Tenenbaum, David C Parkes, and Max Kleiman-Weiner. 2020c. Too many cooks: Coordinating multi-agent collaboration through inverse planning. (2020).
[60]
Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, and Chongjie Zhang. 2020a. RODE: Learning Roles to Decompose Multi-Agent Tasks. arXiv preprint arXiv:2010.01523 (2020).
[61]
Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, and Yaodong Yang. 2022. Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. arXiv preprint arXiv:2205.14953 (2022).
[62]
Yuxin Wu and Yuandong Tian. 2016. Training agent for first-person shooter game with actor-critic curriculum learning. (2016).
[63]
Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, et al. 2020. Towards playing full moba games with deep reinforcement learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 621--632.
[64]
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. 2021. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv preprint arXiv:2103.01955 (2021).
[65]
Daochen Zha, Jingru Xie, Wenye Ma, Sheng Zhang, Xiangru Lian, Xia Hu, and Ji Liu. 2021. Douzero: Mastering doudizhu with self-play deep reinforcement learning. In International Conference on Machine Learning. PMLR, 12333--12344.
[66]
Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E Gonzalez, and Yuandong Tian. 2020. BeBold: Exploration Beyond the Boundary of Explored Regions. arXiv preprint arXiv:2012.08621 (2020).
[67]
Nianzhao Zheng, Jialong Li, Zhenyu Mao, and Kenji Tei. 2022. From Local to Global: A Curriculum Learning Approach for Reinforcement Learning-based Traffic Signal Control. In 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI). IEEE, 253--258.
[68]
Meng Zhou, Ziyu Liu, Pengwei Sui, Yixuan Li, and Yuk Ying Chung. 2020. Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 11853--11864.
[69]
Fengming Zhu Ziyang Li, Kaiwen Zhu. 2020. WeKick. https://www.kaggle.com/c/google-football/discussion/202232.

Cited By

View all
  • (2024)Multi-Agent Diagnostics for Robustness via Illuminated DiversityProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663024(1630-1644)Online publication date: 6-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems
May 2023
3131 pages
ISBN:9781450394321
  • General Chairs:
  • Noa Agmon,
  • Bo An,
  • Program Chairs:
  • Alessandro Ricci,
  • William Yeoh

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 30 May 2023

Check for updates

Author Tags

  1. google research football
  2. large-scale training
  3. multi-agent reinforcement learning
  4. self-play

Qualifiers

  • Research-article

Conference

AAMAS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)5
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-Agent Diagnostics for Robustness via Illuminated DiversityProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663024(1630-1644)Online publication date: 6-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media