Bandits with concave rewards and convex knapsacks

Agrawal, Shipra; Devanur, Nikhil R.

Computer Science > Machine Learning

arXiv:1402.5758 (cs)

[Submitted on 24 Feb 2014]

Title:Bandits with concave rewards and convex knapsacks

Authors:Shipra Agrawal, Nikhil R. Devanur

View PDF

Abstract:In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al.[2013]. We also consider an extension of this model to allow linear contexts, similar to the linear contextual extension of the MAB model. We demonstrate that a natural and simple extension of the UCB family of algorithms for MAB provides a polynomial time algorithm that has near-optimal regret guarantees for this substantially more general model, and matches the bounds provided by Badanidiyuru et al.[2013] for the special case of BwK, which is quite surprising. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well studied problems/algorithms such as the Blackwell approachability problem, online convex optimization, and the Frank-Wolfe technique for convex optimization. We give examples of several concrete applications, where this more general model of bandits allows for richer and/or more efficient formulations of the problem.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1402.5758 [cs.LG]
	(or arXiv:1402.5758v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1402.5758

Submission history

From: Shipra Agrawal [view email]
[v1] Mon, 24 Feb 2014 09:27:18 UTC (99 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2014-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shipra Agrawal
Nikhil R. Devanur

export BibTeX citation

Computer Science > Machine Learning

Title:Bandits with concave rewards and convex knapsacks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Bandits with concave rewards and convex knapsacks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators