Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3702676.3702748guideproceedingsArticle/Chapter ViewAbstractPublication PagesuaiConference Proceedingsconference-collections
research-article

No-regret learning of nash equilibrium for black-box games via Gaussian processes

Published: 15 July 2024 Publication History

Abstract

This paper investigates the challenge of learning in black-box games, where the underlying utility function is unknown to any of the agents. While there is an extensive body of literature on the theoretical analysis of algorithms for computing the Nash equilibrium with complete information about the game, studies on Nash equilibrium in black-box games are less common. In this paper, we focus on learning the Nash equilibrium when the only available information about an agent's payoff comes in the form of empirical queries. We provide a no-regret learning algorithm that utilizes Gaussian processes to identify the equilibrium in such games. Our approach not only ensures a theoretical convergence rate but also demonstrates effectiveness across a variety collection of games through experimental validation.

References

[1]
Abdullah Al-Dujaili, Erik Hemberg, and Una-May O'Reilly. Approximating nash equilibria for black-box games: A bayesian optimization approach. arXiv preprint arXiv: 1804.10586, 2018.
[2]
Anup Aprem and Stephen Roberts. A bayesian optimization approach to compute nash equilibrium of potential games using bandit feedback. The Computer Journal, 64(12): 1801-1813, 2021.
[3]
Raul Astudillo and Peter Frazier. Bayesian optimization of function networks. Advances in neural information processing systems, 34:14463-14475, 2021.
[4]
Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wilson, and Eytan Bakshy. Botorch: A framework for efficient monte-carlo bayesian optimization. Advances in neural information processing systems, 33:21524-21538, 2020.
[5]
Tamer Basar. Relaxation techniques and asynchronous algorithms for on-line computation of non-cooperative equilibria. Journal of Economic Dynamics and Control, 11 (4):531-549, 1987.
[6]
Felix Berkenkamp, Angela P Schoellig, and Andreas Krause. No-regret bayesian optimization with unknown hyperparameters. Journal of Machine Learning Research, 20(50): 1-24, 2019.
[7]
Ilija Bogunovic and Andreas Krause. Misspecified gaussian process bandit optimization. Advances in Neural Information Processing Systems, 34:3004-3015, 2021.
[8]
Patrick Bolton and Mathias Dewatripont. Contract theory. MIT press, 2004.
[9]
Steffen Brenner. Hotelling games with three, four, and more players. Journal of Regional Science, 45(4):851-864, 2005.
[10]
Poompol Buathong, Jiayue Wan, Samuel Daulton, Raul Astudillo, Maximilian Balandat, and Peter I Frazier. Bayesian optimization of function networks with partial evaluations. arXiv preprint arXiv:2311.02146, 2023.
[11]
Archie C Chapman, David S Leslie, Alex Rogers, and Nicholas R Jennings. Convergent learning algorithms for unknown reward games. SIAM Journal on Control and Optimization, 51(4):3154-3180, 2013.
[12]
Sayak Ray Chowdhury and Aditya Gopalan. On kernelized multi-armed bandits. In International Conference on Machine Learning, pages 844-853. PMLR, 2017.
[13]
Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34:27604-27616, 2021.
[14]
Quinlan Dawkins, Minbiao Han, and Haifeng Xu. The limits of optimal pricing in the dark. Advances in Neural Information Processing Systems, 34:26649-26660, 2021.
[15]
Quinlan Dawkins, Minbiao Han, and Haifeng Xu. Firstorder convex fitting and its application to economics and optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6480-6487, 2022.
[16]
John Fearnley, Martin Gairing, Paul W Goldberg, and Rahul Savani. Learning equilibria of games via payoff queries. Journal of Machine Learning Research, 16:1305-1344, 2015.
[17]
Dean Foster and Hobart Peyton Young. Regret testing: Learning to play nash equilibrium without knowing you have an opponent. Theoretical Economics, 1(3):341-367, 2006.
[18]
Dean P Foster and Rakesh Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29 (1-2):7-35, 1999.
[19]
Jiarui Gan, Minbiao Han, Jibang Wu, and Haifeng Xu. Robust stackelberg equilibria. Proceedings of the 24th ACM Conference on Economics and Computation, 2023.
[20]
Roman Garnett. Bayesian Optimization. Cambridge University Press, 2023.
[21]
Ian Gemp, Luke Marris, and Georgios Piliouras. Approximating nash equilibria in normal-form games via stochastic optimization. The Twelfth International Conference on Learning Representations, 2024.
[22]
F Germano and G Lugosi. Global nash convergence of foster and young's regret testing (2005). URL http://www.econ.upf.edu/~lugosi/nash.pdf, 2014.
[23]
Alkis Gotovos, Nathalie Casati, Gregory Hitz, and Andreas Krause. Active learning for level set estimation. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pages 1344-1350, 2013.
[24]
Minbiao Han, Michael Albert, and Haifeng Xu. Learning in online principal-agent interactions: The power of menus. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17426-17434, 2024.
[25]
Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127-1150, 2000.
[26]
Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1): 26-54, 2001.
[27]
Carl Hvarfner, Erik Hellsten, Frank Hutter, and Luigi Nardi. Self-correcting bayesian optimization through bayesian active learning. Advances in Neural Information Processing Systems, 36, 2024.
[28]
Amir Jafari, Amy Greenwald, David Gondek, and Gunes Ercal. On no-regret learning, fictitious play, and nash equilibrium. In ICML, volume 1, pages 226-233, 2001.
[29]
James S Jordan. Bayesian learning in normal form games. Games and Economic Behavior, 3(1):60-81, 1991.
[30]
Patrick R Jordan, Yevgeniy Vorobeychik, and Michael P Wellman. Searching for approximate equilibria in empirical games. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pages 1063-1070, 2008.
[31]
Rajeeva Karandikar, Dilip Mookherjee, Debraj Ray, and Fernando Vega-Redondo. Evolving aspirations and cooperation. journal of economic theory, 80(2):292-331, 1998.
[32]
Peter Koepernik and Florian Pfaff. Consistency of gaussian process regression in metric spaces. The Journal of Machine Learning Research, 22(1):11066-11092, 2021.
[33]
Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
[34]
Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In International symposium on algorithmic game theory, pages 250-262. Springer, 2009.
[35]
Benjamin Letham, Roberto Calandra, Akshara Rai, and Eytan Bakshy. Re-examining linear embeddings for high-dimensional Bayesian optimization. In Advances in Neural Information Processing Systems 33, NeurIPS, 2020.
[36]
Shu Li and Tamer Basar. Distributed algorithms for the computation of noncooperative equilibria. Automatica, 23(4):523-533, 1987.
[37]
Richard J. Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games using simple strategies. In Proceedings of the 4th ACM Conference on Electronic Commerce, EC '03, page 36-41, New York, NY, USA, 2003. Association for Computing Machinery. ISBN 158113679X.
[38]
Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[39]
David JC MacKay. Information-based objective functions for active data selection. Neural computation, 4(4):590-604, 1992.
[40]
Takanori Maehara, Akihiro Yabe, and Ken-ichi Kawarabayashi. Budget allocation problem with multiple advertisers: A game theoretic view. In International Conference on Machine Learning, pages 428-437. PMLR, 2015.
[41]
Alberto Marchesi, Francesco Trovö, and Nicola Gatti. Learning probably approximately correct maximin strategies in simulation-based games with infinite strategy spaces. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 834-842, 2020.
[42]
Jason R Marden, H Peyton Young, Gürdal Arslan, and Jeff S Shamma. Payoff-based dynamics for multiplayer weakly acyclic games. SIAM Journal on Control and Optimization, 48(1):373-396, 2009.
[43]
Mitchell McIntire, Daniel Ratner, and Stefano Ermon. Sparse gaussian processes for bayesian optimization. In UAI, 2016.
[44]
Dov Monderer and Lloyd S Shapley. Potential games. Games and economic behavior, 14(1):124-143, 1996.
[45]
Henry B Moss, Sebastian W Ober, and Victor Picheny. Inducing point allocation for sparse gaussian processes in high-throughput bayesian optimisation. In International Conference on Artificial Intelligence and Statistics, pages 5213-5230. PMLR, 2023.
[46]
Alex Munteanu, Amin Nayebi, and Matthias Poloczek. A framework for bayesian optimization in embedded subspaces. In Proceedings of the 36th International Conference on Machine Learning, (ICML), 2019. Accepted for publication. The code is available at https://github.com/aminnayebi/HesBO.
[47]
John F Nash. Non-cooperative games. 1950.
[48]
Leonard Papenmeier, Luigi Nardi, and Matthias Poloczek. Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces. In Advances in Neural Information Processing Systems, 2022.
[49]
Praveen Paruchuri, Jonathan P Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pages 895-902, 2008.
[50]
Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2149-2156, 2019.
[51]
Victor Picheny, Mickael Binois, and Abderrahmane Habbal. A bayesian optimization approach to find nash equilibria. Journal of Global Optimization, 73(1):171-192, 2019.
[52]
Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. Watch and learn: Optimizing from revealed preferences feedback. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 949-962, 2016.
[53]
Tim Roughgarden. Stackelberg scheduling strategies. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 104-113, 2001.
[54]
Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, and Andreas Krause. No-regret learning in unknown games with correlated payoffs. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[55]
Lei Song, Ke Xue, Xiaobin Huang, and Chao Qian. Monte carlo tree search based variable selection for high dimensional bayesian optimization. arXiv preprint arXiv:2210.01628, 2022.
[56]
Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009.
[57]
Scott Sussex, Anastasiia Makarova, and Andreas Krause. Model-based causal bayesian optimization. arXiv preprint arXiv:2211.10257, 2022.
[58]
Steve H. Tijs. Nash equilibria for noncooperative n-person games in normal form. SIAM Review, 23(2):225-237, 1981. ISSN 00361445.
[59]
STANIsLAV URYAs' Ev and Reuven Y Rubinstein. On relaxation algorithms in computation of noncooperative equilibria. IEEE Transactions on Automatic Control, 39 (6):1263-1267, 1994.
[60]
Bernhard Von Stengel and Shmuel Zamir. Leadership with commitment to mixed strategies. Technical report, Cite-seer, 2004.
[61]
Yevgeniy Vorobeychik. Probabilistic analysis of simulation-based games. ACM Transactions on Modeling and Computer Simulation (TOMACS), 20(3):1-25, 2010.
[62]
Yevgeniy Vorobeychik and Isaac Porche. Game theoretic methods for analysis of combat simulations. Technical report, Working paper, 2009.
[63]
Yevgeniy Vorobeychik, Christopher Kiekintveld, and Michael P Wellman. Empirical mechanism design: Methods, with application to a supply-chain scenario. In Proceedings of the 7th ACM conference on Electronic commerce, pages 306-315, 2006.
[64]
Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, and Nando de Feitas. Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 55:361-387, 2016.
[65]
Michael P Wellman. Methods for empirical game-theoretic analysis. In AAAI, volume 980, pages 1552-1556, 2006.
[66]
Michael P Wellman, Anna Osepayshvili, Jeffrey K MacKie-Mason, and Daniel Reeves. Bidding strategies for simultaneous ascending auctions. The BE Journal of Theoretical Economics, 8(1), 2008.
[67]
Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. In Artificial intelligence and statistics, pages 370-378. PMLR, 2016.
[68]
Rong Yang, Benjamin J Ford, Milind Tambe, and Andrew Lemieux. Adaptive resource allocation for wildlife protection against illegal poachers. In Aamas, pages 453-460, 2014.
[69]
H Peyton Young. Learning by trial and error. Games and economic behavior, 65(2):626-643, 2009.
[70]
Fengxue Zhang, Brian Nord, and Yuxin Chen. Learning representation for bayesian optimization with collisionfree regularization. arXiv preprint arXiv:2203.08656, 2022.
[71]
Fengxue Zhang, Jialin Song, James C Bowden, Alexander Ladd, Yisong Yue, Thomas Desautels, and Yuxin Chen. Learning regions of interest for bayesian optimization with adaptive level-set estimation. In International Conference on Machine Learning, pages 41579-41595. PMLR, 2023.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
UAI '24: Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence
July 2024
4270 pages
  • Editors:
  • Negar Kiyavash,
  • Joris M. Mooij

Sponsors

  • HUAWEI
  • Google
  • DEShaw&Co
  • Barcelona School of Economics
  • Universitat Pompeu Fabra

Publisher

JMLR.org

Publication History

Published: 15 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media