Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3490486.3538308acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
extended-abstract
Public Access

Learning in Stackelberg Games with Non-myopic Agents

Published: 13 July 2022 Publication History

Abstract

Stackelberg games are a canonical model for strategic principal-agent interactions. Consider, for instance, a defense system that distributes its security resources across high-risk targets prior to attacks being executed; or a tax policymaker who sets rules on when audits are triggered prior to seeing filed tax reports; or a seller who chooses a price prior to knowing a customer's proclivity to buy. In each of these scenarios, a principal first selects an action x∈X and then an agent reacts with an action y∈Y, where X and Y are the principal's and agent's action spaces, respectively. In the examples above, agent actions correspond to which target to attack, how much tax to pay to evade an audit, and how much to purchase, respectively. Typically, the principal wants an x that maximizes their payoff when the agent plays a best response y = br(x); such a pair (x, y) is a Stackelberg equilibrium. By committing to a strategy, the principal can guarantee they achieve a higher payoff than in the fixed point equilibrium of the corresponding simultaneous-play game. However, finding such a strategy requires knowledge of the agent's payoff function.
When faced with unknown agent payoffs, the principal can attempt to learn a best response via repeated interactions with the agent. If a (naïve) agent is unaware that such learning occurs and always plays a best response, the principal can use classical online learning approaches to optimize their own payoff in the stage game. Learning from myopic agents has been extensively studied in multiple Stackelberg games, including security games[2,6,7], demand learning[1,5], and strategic classification[3,4].
However, long-lived agents will generally not volunteer information that can be used against them in the future. This is especially the case in online environments where a learner seeks to exploit recently learned patterns of behavior as soon as possible, and the agent can see a tangible advantage for deviating from its instantaneous best response and leading the learner astray. This trade-off between the (statistical) efficiency of learning algorithms and the perverse incentives they may create over the long-term brings us to the main questions of this work: What are principled approaches to learning against non-myopic agents in general Stackelberg games? How can insights from learning against myopic agents be applied to learning in the non-myopic case?

References

[1]
Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57 (6): 1407--1420, 2009.
[2]
Avrim Blum, Nika Haghtalab, and Ariel D Procaccia. Learning optimal commitment to overcome insecurity. In Advances in Neural Information Processing Systems, volume 27, pages 1826--1834, 2014.
[3]
Yiling Chen, Yang Liu, and Chara Podimata. Learning strategy-aware linear classifiers. In Advances in Neural Information Processing Systems, volume 33, pages 15265--15276, 2020.
[4]
Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, page 55--70, 2018.
[5]
Robert D. Kleinberg and Frank Thomson Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In 44th Symposium on Foundations of Computer Science, pages 594--605, 2003.
[6]
Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In International Symposium on Algorithmic Game Theory, pages 250--262. Springer, 2009.
[7]
Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2149--2156, 2019.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EC '22: Proceedings of the 23rd ACM Conference on Economics and Computation
July 2022
1269 pages
ISBN:9781450391504
DOI:10.1145/3490486
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2022

Check for updates

Author Tags

  1. Stackelberg games
  2. bandit optimization
  3. non-myopic learning
  4. security games

Qualifiers

  • Extended-abstract

Funding Sources

Conference

EC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 470
    Total Downloads
  • Downloads (Last 12 months)284
  • Downloads (Last 6 weeks)25
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media