A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Agrawal, Priyank; Tulabandhula, Theja; Avadhanula, Vashist

Computer Science > Machine Learning

arXiv:2011.14033 (cs)

[Submitted on 28 Nov 2020 (v1), last revised 14 Apr 2024 (this version, v7)]

Title:A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Authors:Priyank Agrawal, Theja Tulabandhula, Vashist Avadhanula

View PDF HTML (experimental)

Abstract:In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision maker problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon $T$. Though this problem has attracted considerable attention in recent times, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, existing algorithms for this problem have regret bounded by $O(\sqrt{\kappa d T})$, where $\kappa$ is a problem-dependent constant that can have an exponential dependency on the number of attributes. In this paper, we propose an optimistic algorithm and show that the regret is bounded by $O(\sqrt{dT} + \kappa)$, significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favourable regret guarantee.

Comments:	Bug fixed
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2011.14033 [cs.LG]
	(or arXiv:2011.14033v7 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.14033

Submission history

From: Priyank Agrawal [view email]
[v1] Sat, 28 Nov 2020 00:20:36 UTC (90 KB)
[v2] Fri, 19 Feb 2021 07:48:58 UTC (100 KB)
[v3] Sun, 7 Mar 2021 19:53:26 UTC (95 KB)
[v4] Sun, 19 Jun 2022 03:30:35 UTC (107 KB)
[v5] Mon, 27 Mar 2023 17:47:44 UTC (218 KB)
[v6] Fri, 18 Aug 2023 16:10:22 UTC (218 KB)
[v7] Sun, 14 Apr 2024 14:47:24 UTC (218 KB)

Computer Science > Machine Learning

Title:A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators