Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3600270.3602602guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Inverse game theory for stackelberg games: the blessing of bounded rationality

Published: 28 November 2022 Publication History

Abstract

Optimizing strategic decisions (a.k.a. computing equilibrium) is key to the success of many non-cooperative multi-agent applications. However, in many real-world situations, we may face the exact opposite of this game-theoretic problem — instead of prescribing equilibrium of a given game, we may directly observe the agents' equilibrium behaviors but want to infer the underlying parameters of an unknown game. This research question, also known as inverse game theory, has been studied in multiple recent works in the context of Stackelberg games. Unfortunately, existing works exhibit quite negative results, showing statistical hardness [27, 37] and computational hardness [24, 25, 26], assuming follower's perfectly rational behaviors. Our work relaxes the perfect rationality agent assumption to the classic quantal response model, a more realistic behavior model of bounded rationality. Interestingly, we show that the smooth property brought by such bounded rationality model actually leads to provably more efficient learning of the follower utility parameters in general Stackelberg games. Systematic empirical experiments on synthesized games confirm our theoretical results and further suggest its robustness beyond the strict quantal response model.

Supplementary Material

Additional material (3600270.3602602_supp.pdf)
Supplemental material.

References

[1]
Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.
[2]
Bo An, Fernando Ordóñez, Milind Tambe, Eric Shieh, Rong Yang, Craig Baldwin, Joseph DiRenzo III, Kathryn Moretti, Ben Maule, and Garrett Meyer. A deployed quantal response-based patrol planning system for the us coast guard. Interfaces, 43(5):400-420, 2013.
[3]
Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121-164, 2012.
[4]
Patrice Assouad. Deux remarques sur l'estimation. Comptes rendus des séances de l'Académie des sciences. Série 1, Mathématique, 296(23):1021-1024, 1983.
[5]
Robert J Aumann. Rationality and bounded rationality. In Cooperation: Game-Theoretic Approaches, pages 219-231. Springer, 1997.
[6]
Yu Bai, Chi Jin, Huan Wang, and Caiming Xiong. Sample-efficient learning of stackelberg equilibria in general-sum games. Advances in Neural Information Processing Systems, 34, 2021.
[7]
Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the sixteenth ACM conference on economics and computation, pages 61-78, 2015.
[8]
Christopher M Bishop and Nasser M Nasrabadi. Pattern recognition and machine learning, volume 4. Springer, 2006.
[9]
Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. Advances in Neural Information Processing Systems, 27, 2014.
[10]
Colin F Camerer. Behavioral game theory: Experiments in strategic interaction. Princeton university press, 2011.
[11]
Colin F Camerer, Teck-Hua Ho, and Juin-Kuan Chong. A cognitive hierarchy model of games. The Quarterly Journal of Economics, 119(3):861-898, 2004.
[12]
Jakub Černỳ, Viliam Lisỳ, Branislav BoŠanskỳ, and Bo An. Computing quantal stackelberg equilibrium in extensive-form games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 5260-5268, 2021.
[13]
Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, pages 493-507, 1952.
[14]
Vincent Conitzer. On stackelberg mixed strategies. Synthese, 193(3):689-703, 2016.
[15]
Gerard Debreu. Individual choice behavior: A theoretical analysis, 1960.
[16]
Fei Fang, Peter Stone, and Milind Tambe. When security games go green: Designing defender strategies to prevent poaching and illegal fishing. In Twenty-fourth international joint conference on artificial intelligence, 2015.
[17]
Fei Fang, Thanh H Nguyen, Rob Pickles, Wai Y Lam, Gopalasamy R Clements, Bo An, Amandeep Singh, Brian C Schwedock, Milin Tambe, and Andrew Lemieux. Paws—a deployed game-theoretic application to combat poaching. AI Magazine, 38(1):23-36, 2017.
[18]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
[19]
Sanford J Grossman and Oliver D Hart. An analysis of the principal-agent problem. In Foundations of insurance economics, pages 302-340. Springer, 1992.
[20]
Nika Haghtalab, Fei Fang, Thanh Hong Nguyen, Arunesh Sinha, Ariel D Procaccia, and Milind Tambe. Three strategies to success: Learning adversary models in security games. 2016.
[21]
Bengt Holmström. Moral hazard and observability. The Bell journal of economics, pages 74-91, 1979.
[22]
Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
[23]
Daniel Kahneman. Econ ometrica i ci. Econometrica, 47(2):263-291, 1979.
[24]
Shankar Kalyanaraman and Christopher Umans. The complexity of rationalizing matchings. In International Symposium on Algorithms and Computation, pages 171-182. Springer, 2008.
[25]
Shankar Kalyanaraman and Christopher Umans. The complexity of rationalizing network formation. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 485-494. IEEE, 2009.
[26]
Volodymyr Kuleshov and Okke Schrijvers. Inverse game theory: Learning utilities in succinct games. In International Conference on Web and Internet Economics, pages 413-427. Springer, 2015.
[27]
Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In International symposium on algorithmic game theory, pages 250-262. Springer, 2009.
[28]
Bernhardt Lieberman. Human behavior in a strictly determined 3× 3 matrix game. Behavioral Science, 5(4):317-322, 1960.
[29]
Chun Kai Ling, Fei Fang, and J Zico Kolter. What game are we playing? end-to-end learning in normal and extensive form games. arXiv preprint arXiv:1805.02777, 2018.
[30]
Janusz Marecki, Gerry Tesauro, and Richard Segal. Playing repeated stackelberg games with unknown opponents. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 821-828, 2012.
[31]
Daniel L McFadden. Quantal choice analaysis: A survey. Annals of economic and social measurement, volume 5, number4, pages 363-390, 1976.
[32]
Richard D McKelvey and Thomas R Palfrey. Quantal response equilibria for normal form games. Games and economic behavior, 10(1):6-38, 1995.
[33]
Panayotis Mertikopoulos and William H Sandholm. Learning in games via reinforcement and regularization. Mathematics of Operations Research, 41(4):1297-1324, 2016.
[34]
Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
[35]
Thanh Nguyen, Rong Yang, Amos Azaria, Sarit Kraus, and Milind Tambe. Analyzing the effectiveness of adversary modeling in security games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 27, pages 718-724, 2013.
[36]
Barry O'Neill. Nonmetric test of the minimax theory of two-person zerosum games. Proceedings of the national academy of sciences, 84(7):2106-2109, 1987.
[37]
Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2149-2156, 2019.
[38]
James Pita, Manish Jain, Milind Tambe, Fernando Ordónez, and Sarit Kraus. Robust solutions to stackelberg games: Addressing bounded rationality and limited observations in human cognition. Artificial Intelligence, 174(15):1142-1171, 2010.
[39]
Arunesh Sinha, Debarun Kar, and Milind Tambe. Learning adversary behavior in security games: A pac model perspective. arXiv preprint arXiv:1511.00043, 2015.
[40]
Yevgeniy Vorobeychik, Michael P Wellman, and Satinder Singh. Learning payoff functions in infinite games. Machine Learning, 67(1):145-168, 2007.
[41]
Rong Yang, Christopher Kiekintveld, Fernando Ordonez, Milind Tambe, and Richard John. Improving resource allocation strategy against human adversaries in security games. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.
[42]
Rong Yang, Fernando Ordonez, and Milind Tambe. Computing optimal strategy against quantal response in security games. In AAMAS, pages 847-854, 2012.
[43]
Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C Parkes, and Richard Socher. The ai economist: Improving equality and productivity with ai-driven tax policies. arXiv preprint arXiv:2004.13332, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems
November 2022
39114 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 28 November 2022

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media