Article

Regularization of the Policy Updates for Stabilizing Mean Field Games

Authors:

Talal Algumaei,

Ruben Solozabal,

Merouane Debbah,

Martin TakáčAuthors Info & Claims

Advances in Knowledge Discovery and Data Mining: 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Osaka, Japan, May 25–28, 2023, Proceedings, Part II

Pages 361 - 372

https://doi.org/10.1007/978-3-031-33377-4_28

Published: 28 May 2023 Publication History

Abstract

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.

References

[1]

Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

[2]

Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, et al. Mastering atari, go, chess and shogi by planning with a learned model Nature 2020 588 7839 604-609

[3]

Vinyals, O., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)

[4]

Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)

[5]

Mathieu. Lauriere. Numerical methods for mean field games and mean field type control. Mean Field Games 78 p221 (2021)

[6]

Huang M et al. Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle Commun. Inf. Syst. 2006 6 3 221-252

[7]

Sonu E, Chen Y, and Doshi P Decision-theoretic planning under anonymity in agent populations J. Artif. Intell. Res. 2017 59 725-770

[8]

Perolat, J., et al.: Scaling up mean field games with online mirror descent. arXiv:2103.00623 (2021)

[9]

Laurière, M., et al.: Scalable deep reinforcement learning algorithms for mean field games. arXiv:2203.11973 (2022)

[10]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

[11]

Lanctot, M., et al.: Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453 (2019)

[12]

Subramanian, S.G., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. arXiv:2012.15791 (2020)

[13]

Angiuli, A., Fouque, J.-P., Laurière, M.: Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, pp. 1–55 (2022)

[14]

Subramanian, J., Mahajan, A.: Reinforcement learning in stationary mean-field games. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 251–259 (2019)

[15]

Mishra, R.K., Vasal, D., Vishwanath, S.: Model-free reinforcement learning for non-stationary mean field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 1032–1037. IEEE (2020)

[16]

Cui, K., Koeppl, H.: Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: International Conference on Artificial Intelligence and Statistics, pp. 1909–1917. PMLR (2021)

[17]

Cardaliaguet, P., Hadikhanloo, S.: Learning in mean field games: the fictitious play. ESAIM: Control, Optimisation and Calculus of Variations, 23 (2017)

[18]

Perrin, S., Pérolat, J., Laurière, M., Geist, M., Elie, R., Pietquin, O.: Fictitious play for mean field games: Continuous time analysis and applications. Advances in Neural Information Processing Systems, 33 (2020)

[19]

Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121 (2016)

[20]

Cacace, S., Camilli, F., Goffi, A.: A policy iteration method for mean field games. ESAIM: Control, Optimisation and Calculus of Variations, 27, p. 85 (2021)

[21]

Shalev-Shwartz, S., et al.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)

[22]

Vieillard N, Pietquin O, and Geist M Munchausen reinforcement learning Adv. Neural. Inf. Process. Syst. 2020 33 4235-4246

[23]

Zaman, M.A.U., et al.: Oracle-free reinforcement learning in mean-field games along a single sample path. In: International Conference on Artificial Intelligence and Statistics, pp. 10178–10206. PMLR (2023)

[24]

Xie, Q., Yang, Z., Wang, Z., Minca, A.: Learning while playing in mean-field games: Convergence and optimality. In: International Conference on Machine Learning, pp. 11436–11447. PMLR (2021)

[25]

Fu, Z., Yang, Z., Chen, Y., Wang, Z.: Actor-critic provably finds nash equilibria of linear-quadratic mean-field games. arXiv:1910.07498 (2019)

[26]

Angiuli, A., Fouque, J.-P., Lauriere, M.: Reinforcement learning for mean field games, with applications to economics. arXiv:2106.13755 (2021)

[27]

Aneeq uz Zaman, M., Zhang, K., Miehling, E., Basar, T.: Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 2278–2284. IEEE (2020)

[28]

Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv:2103.01955 (2021)

[29]

Bowling M, Burch N, Johanson M, and Tammelin O Heads-up limit hold’em poker is solved Science 2015 347 6218 145-149

[30]

Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte carlo sampling for regret minimization in extensive games. Advances in neural information processing systems, 22 (2009)

Recommendations

Reinforcement Learning in Stationary Mean-field Games
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Multi-agent reinforcement learning has made significant progress in recent years, but it remains a hard problem. Hence, one often resorts to developing learning algorithms for specific classes of multi-agent systems. In this paper we study reinforcement ...
Markov--Nash Equilibria in Mean-Field Games with Discounted Cost

In this paper, we consider discrete-time dynamic games of the mean-field type with a finite number $N$ of agents subject to an infinite-horizon discounted-cost optimality criterion. The state space of each agent is a Polish space. At each time, the agents ...
Approximate Markov-Nash Equilibria for Discrete-Time Risk-Sensitive Mean-Field Games
In this paper, we study a class of discrete-time mean-field games under the infinite-horizon risk-sensitive optimality criterion. Risk sensitivity is introduced for each agent (player) via an exponential utility function. In this game model, each agent is ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Advances in Knowledge Discovery and Data Mining: 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Osaka, Japan, May 25–28, 2023, Proceedings, Part II

May 2023

562 pages

ISBN:978-3-031-33376-7

DOI:10.1007/978-3-031-33377-4

Editors:
Hisashi Kashima
Kyoto University, Kyoto, Japan
,
Tsuyoshi Ide
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
,
Wen-Chih Peng
National Chiao Tung University, Hsinchu, Taiwan

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 28 May 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents