Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

Tatarenko, Tatiana; Kamgarpour, Maryam

Mathematics > Optimization and Control

arXiv:1806.05069 (math)

[Submitted on 13 Jun 2018 (v1), last revised 2 May 2020 (this version, v3)]

Title:Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

Authors:Tatiana Tatarenko, Maryam Kamgarpour

View PDF

Abstract:We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes the value of the corresponding cost function evaluated at her chosen action (zero-order oracle). The objective is to minimize the regret, that is, the difference between the sum of the costs she accumulates and that of a static optimal action had she known the sequence of cost functions a priori. We present a novel algorithm to minimize regret in unconstrained action spaces. Our algorithm hinges on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values. The algorithm is independent of problem parameters. Letting $T$ denote the number of queries of the zero-order oracle and $n$ the problem dimension, the regret rate achieved is $O(n^{2/3}T^{2/3})$. Moreover, we adapt the presented algorithm to the setting with two-point feedback and demonstrate that the adapted procedure achieves the theoretical lower bound on the regret of $(n^{1/2}T^{1/2})$.

Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG)
Cite as:	arXiv:1806.05069 [math.OC]
	(or arXiv:1806.05069v3 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.1806.05069

Submission history

From: Maryam Kamgarpour [view email]
[v1] Wed, 13 Jun 2018 14:14:38 UTC (26 KB)
[v2] Fri, 10 Aug 2018 09:40:07 UTC (28 KB)
[v3] Sat, 2 May 2020 18:02:08 UTC (22 KB)

Mathematics > Optimization and Control

Title:Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators