Non-Compliant Bandits
Abstract
Supplementary Material
- Download
- 41.80 MB
References
Recommendations
Tsallis-INF: an optimal algorithm for stochastic and adversarial bandits
We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent (OMD) with ...
Knows what it knows: a framework for self-aware learning
We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the ...
Bandits with switching costs: T2/3 regret
STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computingWe study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T-round minimax regret in this setting is [EQUATION], thereby closing a fundamental gap in our ...
Comments
Information & Contributors
Information
Published In
![cover image ACM Conferences](/cms/asset/82db89b7-bd99-4844-b627-ad60fd85c042/3583780.cover.jpg)
- General Chairs:
- Ingo Frommholz,
- Frank Hopfgartner,
- Mark Lee,
- Michael Oakes,
- Program Chairs:
- Mounia Lalmas,
- Min Zhang,
- Rodrygo Santos
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Conference
Acceptance Rates
Upcoming Conference
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 269Total Downloads
- Downloads (Last 12 months)209
- Downloads (Last 6 weeks)26
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in