Abstract
Multi-agent reinforcement learning requires the reward signals given by the environment to guide the convergence of individual agents’ policy networks. However, in a high-dimensional continuous space, the non-stationary environment may provide outdated experiences that lead to the inability to converge. The existing methods can be ineffective in achieving a satisfactory training performance due to the inherent non-stationary property of the multi-agent system. We propose a novel reinforcement learning scheme, MADSC, to generate an optimized cooperative policy. Our scheme utilizes mutual information to evaluate the intrinsic reward function that can generate a cooperative policy based on the option framework. In addition, by linking the learned skills to form a skill chain, the convergence speed of agent learning can be significantly accelerated. Hence, multi-agent systems can benefit from MADSC to achieve strategic advantages by significantly reducing the learning steps. Experiments are performed on the SMAC multi-agent tasks with varying difficulties. Experimental results demonstrate that our proposed scheme can effectively outperform the state-of-the-art methods, including IQL, QMIX, and hDQN, with a single layer of temporal abstraction.
*Supported by The Belt and Road Special Foundation of the State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering under Grant 2021490811, and the National Natural Science Foundation of China under Grant No. 61872171.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55(2), 895–943 (2022)
Kang, Y., Wang, X., et al.: Q-adaptive: a multi-agent reinforcement learning based routing on dragonfly network. In: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, pp. 189–200 (2021)
Canese, L., Cardarilli, G.C., Di Nunzio, L., et al.: Multi-agent reinforcement learning: a review of challenges and applications. Appl. Sci. 11(11), 4948 (2021)
Ma, J., Wu, F.: Feudal multi-agent deep reinforcement learning for traffic signal control. In: Proceedings of International Conference on Autonomous Agents and MultiAgent Systems, pp. 816–824. AAMAS (2020)
Su, J., Adams, S.C., et al.: Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11352–11360 (2021)
Sarafian, E., Tamar, A., Kraus, S.: Constrained policy improvement for efficient reinforcement learning. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 2863–2871 (2020)
Terry, J.K., Black, B., Grammel, N., et al.: Pettingzoo: gym for multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 15032–15043 (2021)
Liu, Y., Hu, Y., Gao, Y., et al.: Value function transfer for deep multi-agent reinforcement learning based on N-step returns. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI, pp. 457–463 (2019)
Phan, T., Belzner, L., et al.: Resilient multi-agent reinforcement learning with adversarial value decomposition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11308–11316 (2021)
Danassis, P., Wiedemair, F., et al.: Improving multi-agent coordination by learning to estimate contention. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI, pp. 125–131 (2021)
Rashid, T., Samvelyan, M., et al.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the International Conference on Machine Learning, ICML, vol. 80, pp. 4292–4301 (2018)
Bacon, P., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1726–1734 (2017)
Tang, H., Hao, J., Lv, T., et al.: Hierarchical deep multiagent reinforcement learning. CoRR abs/1809.09332 (2018)
Tessler, C., Givony, S., Zahavy, T., et al.: A deep hierarchical approach to lifelong learning in MineCraft. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Andrychowicz, M., Crow, D., Ray, A., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 5048–5058 (2017)
Bagaria, A., Konidaris, G.: Option discovery using deep skill chaining. In: International Conference on Learning Representations, ICLR (2020)
Sharma, A., Gu, S., Levine, S., et al.: Dynamics-aware unsupervised discovery of skills. In: International Conference on Learning Representations, ICLR (2019)
Sayin, M., Zhang, K., Leslie, D., et al.: Decentralized Q-learning in zero-sum markov games. In: Advances in Neural Information Processing Systems, NeurIPS, vol. 34, pp. 18320–18334 (2021)
Engstrom, L., Ilyas, A., Santurkar, S., et al.: Implementation matters in deep RL: a case study on PPO and TRPO. In: International Conference on Learning Representations, ICLR (2020)
Osband, I., Blundell, C., Pritzel, A., et al.: Deep exploration via bootstrapped DQN. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 4026–4034 (2016)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning, ICML, vol. 48, pp. 1928–1937 (2016)
Kulkarni, T.D., Narasimhan, K., et al.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 3675–3683 (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Kim, J., Park, S., Kim, G.: Unsupervised skill discovery with bottleneck option learning. In: Proceedings of the International Conference on Machine Learning, ICML, vol. 139, pp. 5572–5582 (2021)
Lin, Y., Gou, Y., Liu, Z., et al.: COMPLETER: incomplete multi-view clustering via contrastive prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 11174–11183 (2021)
Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., et al.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations, ICLR (2019)
Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. In: Advances in Neural Information Processing Systems, NeurIPS, pp. 15509–15519 (2019)
Samvelyan, M., Rashid, T., de Witt, C.S., et al.: The StarCraft Multi-Agent Challenge. CoRR abs/1902.04043 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, Z., Ji, C., Zhang, Y. (2022). Deep Skill Chaining with Diversity for Multi-agent Systems*. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13606. Springer, Cham. https://doi.org/10.1007/978-3-031-20503-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-20503-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20502-6
Online ISBN: 978-3-031-20503-3
eBook Packages: Computer ScienceComputer Science (R0)