Generalized Thompson Sampling for Contextual Bandits

Li, Lihong

Computer Science > Machine Learning

arXiv:1310.7163 (cs)

[Submitted on 27 Oct 2013]

Title:Generalized Thompson Sampling for Contextual Bandits

Authors:Lihong Li

View PDF

Abstract:Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this heuristic. In this paper, we approach this problem in a way very different from existing efforts. In particular, motivated by the connection between Thompson Sampling and exponentiated updates, we propose a new family of algorithms called Generalized Thompson Sampling in the expert-learning framework, which includes Thompson Sampling as a special case. Similar to most expert-learning algorithms, Generalized Thompson Sampling uses a loss function to adjust the experts' weights. General regret bounds are derived, which are also instantiated to two important loss functions: square loss and logarithmic loss. In contrast to existing bounds, our results apply to quite general contextual bandits. More importantly, they quantify the effect of the "prior" distribution on the regret bounds.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML); Other Statistics (stat.OT)
MSC classes:	62L05
ACM classes:	I.2.6
Cite as:	arXiv:1310.7163 [cs.LG]
	(or arXiv:1310.7163v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1310.7163

Submission history

From: Lihong Li [view email]
[v1] Sun, 27 Oct 2013 06:29:55 UTC (17 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2013-10

Change to browse by:

cs
cs.AI
stat
stat.ML
stat.OT

References & Citations

DBLP - CS Bibliography

listing | bibtex

Lihong Li

export BibTeX citation

Computer Science > Machine Learning

Title:Generalized Thompson Sampling for Contextual Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Generalized Thompson Sampling for Contextual Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators