Article

Hedged learning: regret-minimization with learning experts

Authors:

Yu-Han Chang,

Leslie Pack KaelblingAuthors Info & Claims

ICML '05: Proceedings of the 22nd international conference on Machine learning

Pages 121 - 128

https://doi.org/10.1145/1102351.1102367

Published: 07 August 2005 Publication History

Get Access

Abstract

In non-cooperative multi-agent situations, there cannot exist a globally optimal, yet opponent-independent learning algorithm. Regret-minimization over a set of strategies optimized for potential opponent models is proposed as a good framework for deciding how to behave in such situations. Using longer playing horizons and experts that learn as they play, the regret-minimization framework can be extended to overcome several shortcomings of earlier approaches to the problem of multi-agent learning.

References

[1]

Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. Proceedings of the 36th Symposium on Foundations of Computer Science.]]

Digital Library

Google Scholar

[2]

Chang, Y., & Kaelbling, L. P. (2001). Playing is believing: The role of beliefs in multi-agent learning. NIPS.]]

Google Scholar

[3]

de Farias, D. P., & Meggido, N. (2004). How to combine expert (or novice) advice when actions impact the environment. Proceedings of NIPS.]]

Google Scholar

[4]

Freund, Y., & Schapire, R. E. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79--103.]]

Crossref

Google Scholar

[5]

Fudenburg, D., & Levine, D. K. (1995). Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19, 1065--1089.]]

Crossref

Google Scholar

[6]

Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. Proceedings of the 15th ICML.]]

Digital Library

Google Scholar

[7]

Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. ICML.]]

Digital Library

Google Scholar

[8]

Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. ICML.]]

Google Scholar

[9]

Mannor, S., & Shimkin, N. (2001). Adaptive strategies and regret minimization in arbitrarily varying Markov environments. Proc. of 14th COLT.]]

Digital Library

Google Scholar

[10]

Nachbar, J., & Zame, W. (1996). Non-computable strategies and discounted repeated games. Economic Theory.]]

Google Scholar

Cited By

View all

Chang Y(2019)No regrets about no-regretArtificial Intelligence10.1016/j.artint.2006.12.007171:7(434-439)Online publication date: 2-Jan-2019
https://dl.acm.org/doi/10.1016/j.artint.2006.12.007
Powers RShoham YVu T(2018)A general criterion and an algorithmic framework for learning in multi-agent systemsMachine Language10.1007/s10994-006-9643-267:1-2(45-76)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s10994-006-9643-2
Chang YMaheswaran RSonenberg LStone PTumer KYolum P(2011)The social Ultimatum Game and adaptive agentsThe 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 310.5555/2034396.2034542(1313-1314)Online publication date: 2-May-2011
https://dl.acm.org/doi/10.5555/2034396.2034542
Show More Cited By

Hedged learning: regret-minimization with learning experts
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence

Recommendations

Incentive-Compatible Learning of Reserve Prices for Repeated Auctions
How can an auctioneer optimize revenue by learning the reserve prices from the bids in the previous auctions? How should the long-term incentives and strategic behavior of the bidders be taken into account? Motivated in part by applications in online ...
Large fractions of online advertisements are sold via repeated second-price auctions. In these auctions, the reserve price is the main tool for the auctioneer to boost revenues. In this work, we investigate the following question: how can the auctioneer ...
No regret learning in oligopolies: cournot vs. bertrand
SAGT'10: Proceedings of the Third international conference on Algorithmic game theory

Cournot and Bertrand oligopolies constitute the two most prevalent models of firm competition. The analysis of Nash equilibria in each model reveals a unique prediction about the stable state of the system. Quite alarmingly, despite the similarities of ...
Dynamic incentive-aware learning: robust pricing in contextual auctions
NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item depend on the context that describes the item. ...

Comments

Information & Contributors

Information

Published In

ICML '05: Proceedings of the 22nd international conference on Machine learning

August 2005

1113 pages

ISBN:1595931805

DOI:10.1145/1102351

General Chair:
Saso Dzeroski
Jozef Stefan Institute, Slovenia
,
Program Chairs:
Luc De Raedt,
Stefan Wrobel

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
202
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chang Y(2019)No regrets about no-regretArtificial Intelligence10.1016/j.artint.2006.12.007171:7(434-439)Online publication date: 2-Jan-2019
https://dl.acm.org/doi/10.1016/j.artint.2006.12.007
Powers RShoham YVu T(2018)A general criterion and an algorithmic framework for learning in multi-agent systemsMachine Language10.1007/s10994-006-9643-267:1-2(45-76)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s10994-006-9643-2
Chang YMaheswaran RSonenberg LStone PTumer KYolum P(2011)The social Ultimatum Game and adaptive agentsThe 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 310.5555/2034396.2034542(1313-1314)Online publication date: 2-May-2011
https://dl.acm.org/doi/10.5555/2034396.2034542
Bouzy BMétivier M(2010)Multi-agent learning experiments on repeated matrix gamesProceedings of the 27th International Conference on International Conference on Machine Learning10.5555/3104322.3104339(119-126)Online publication date: 21-Jun-2010
https://dl.acm.org/doi/10.5555/3104322.3104339

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Incentive-Compatible Learning of Reserve Prices for Repeated Auctions

No regret learning in oligopolies: cournot vs. bertrand

Dynamic incentive-aware learning: robust pricing in contextual auctions

Comments

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

Incentive-Compatible Learning of Reserve Prices for Repeated Auctions

No regret learning in oligopolies: cournot vs. bertrand

Dynamic incentive-aware learning: robust pricing in contextual auctions

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations