lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Jamieson, Kevin; Malloy, Matthew; Nowak, Robert; Bubeck, Sébastien

Statistics > Machine Learning

arXiv:1312.7308 (stat)

[Submitted on 27 Dec 2013]

Title:lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Authors:Kevin Jamieson, Matthew Malloy, Robert Nowak, Sébastien Bubeck

View PDF

Abstract:The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1312.7308 [stat.ML]
	(or arXiv:1312.7308v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1312.7308

Submission history

From: Kevin Jamieson [view email]
[v1] Fri, 27 Dec 2013 18:20:09 UTC (995 KB)

Full-text links:

Access Paper:

view license

Current browse context:

stat.ML

< prev | next >

new | recent | 2013-12

Change to browse by:

cs
cs.LG
stat

References & Citations

export BibTeX citation

Statistics > Machine Learning

Title:lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators