Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-33377-4_28guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Regularization of the Policy Updates for Stabilizing Mean Field Games

Published: 28 May 2023 Publication History

Abstract

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.

References

[1]
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
[2]
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, et al. Mastering atari, go, chess and shogi by planning with a learned model Nature 2020 588 7839 604-609
[3]
Vinyals, O., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
[4]
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
[5]
Mathieu. Lauriere. Numerical methods for mean field games and mean field type control. Mean Field Games 78 p221 (2021)
[6]
Huang M et al. Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle Commun. Inf. Syst. 2006 6 3 221-252
[7]
Sonu E, Chen Y, and Doshi P Decision-theoretic planning under anonymity in agent populations J. Artif. Intell. Res. 2017 59 725-770
[8]
Perolat, J., et al.: Scaling up mean field games with online mirror descent. arXiv:2103.00623 (2021)
[9]
Laurière, M., et al.: Scalable deep reinforcement learning algorithms for mean field games. arXiv:2203.11973 (2022)
[10]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
[11]
Lanctot, M., et al.: Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453 (2019)
[12]
Subramanian, S.G., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. arXiv:2012.15791 (2020)
[13]
Angiuli, A., Fouque, J.-P., Laurière, M.: Unified reinforcement q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, pp. 1–55 (2022)
[14]
Subramanian, J., Mahajan, A.: Reinforcement learning in stationary mean-field games. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 251–259 (2019)
[15]
Mishra, R.K., Vasal, D., Vishwanath, S.: Model-free reinforcement learning for non-stationary mean field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 1032–1037. IEEE (2020)
[16]
Cui, K., Koeppl, H.: Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: International Conference on Artificial Intelligence and Statistics, pp. 1909–1917. PMLR (2021)
[17]
Cardaliaguet, P., Hadikhanloo, S.: Learning in mean field games: the fictitious play. ESAIM: Control, Optimisation and Calculus of Variations, 23 (2017)
[18]
Perrin, S., Pérolat, J., Laurière, M., Geist, M., Elie, R., Pietquin, O.: Fictitious play for mean field games: Continuous time analysis and applications. Advances in Neural Information Processing Systems, 33 (2020)
[19]
Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121 (2016)
[20]
Cacace, S., Camilli, F., Goffi, A.: A policy iteration method for mean field games. ESAIM: Control, Optimisation and Calculus of Variations, 27, p. 85 (2021)
[21]
Shalev-Shwartz, S., et al.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)
[22]
Vieillard N, Pietquin O, and Geist M Munchausen reinforcement learning Adv. Neural. Inf. Process. Syst. 2020 33 4235-4246
[23]
Zaman, M.A.U., et al.: Oracle-free reinforcement learning in mean-field games along a single sample path. In: International Conference on Artificial Intelligence and Statistics, pp. 10178–10206. PMLR (2023)
[24]
Xie, Q., Yang, Z., Wang, Z., Minca, A.: Learning while playing in mean-field games: Convergence and optimality. In: International Conference on Machine Learning, pp. 11436–11447. PMLR (2021)
[25]
Fu, Z., Yang, Z., Chen, Y., Wang, Z.: Actor-critic provably finds nash equilibria of linear-quadratic mean-field games. arXiv:1910.07498 (2019)
[26]
Angiuli, A., Fouque, J.-P., Lauriere, M.: Reinforcement learning for mean field games, with applications to economics. arXiv:2106.13755 (2021)
[27]
Aneeq uz Zaman, M., Zhang, K., Miehling, E., Basar, T.: Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games. In: 2020 59th IEEE Conference on Decision and Control (CDC), pp. 2278–2284. IEEE (2020)
[28]
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv:2103.01955 (2021)
[29]
Bowling M, Burch N, Johanson M, and Tammelin O Heads-up limit hold’em poker is solved Science 2015 347 6218 145-149
[30]
Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte carlo sampling for regret minimization in extensive games. Advances in neural information processing systems, 22 (2009)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Advances in Knowledge Discovery and Data Mining: 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Osaka, Japan, May 25–28, 2023, Proceedings, Part II
May 2023
562 pages
ISBN:978-3-031-33376-7
DOI:10.1007/978-3-031-33377-4
  • Editors:
  • Hisashi Kashima,
  • Tsuyoshi Ide,
  • Wen-Chih Peng

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 28 May 2023

Author Tags

  1. Reinforcement learning
  2. mean-field games
  3. proximal policy optimization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media