More Adaptive Algorithms for Adversarial Bandits

Wei, Chen-Yu; Luo, Haipeng

Computer Science > Machine Learning

arXiv:1801.03265 (cs)

[Submitted on 10 Jan 2018 (v1), last revised 7 Jun 2018 (this version, v3)]

Title:More Adaptive Algorithms for Adversarial Bandits

Authors:Chen-Yu Wei, Haipeng Luo

View PDF

Abstract:We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret bounds improving previous work. Examples include: 1) a regret bound depending on the variance of only the best arm; 2) a regret bound depending on the first-order path-length of only the best arm; 3) a regret bound depending on the sum of first-order path-lengths of all arms as well as an important negative term, which together lead to faster convergence rates for some normal form games with partial feedback; 4) a regret bound that simultaneously implies small regret when the best arm has small loss and logarithmic regret when there exists an arm whose expected loss is always smaller than those of others by a fixed gap (e.g. the classic i.i.d. setting). In some cases, such as the last two results, our algorithm is completely parameter-free.
The main idea of our algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer. The challenges are to come up with appropriate optimistic predictions and correction terms in this framework. Some of our results also crucially rely on using a sophisticated increasing learning rate schedule.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1801.03265 [cs.LG]
	(or arXiv:1801.03265v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1801.03265

Submission history

From: Chen-Yu Wei [view email]
[v1] Wed, 10 Jan 2018 08:28:00 UTC (44 KB)
[v2] Tue, 20 Feb 2018 00:23:45 UTC (44 KB)
[v3] Thu, 7 Jun 2018 16:57:01 UTC (46 KB)

Computer Science > Machine Learning

Title:More Adaptive Algorithms for Adversarial Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:More Adaptive Algorithms for Adversarial Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators