Contextual bandits in a collaborative environment

Q Wu, H Wang, Q Gu, H Wang - … of the 39th International ACM SIGIR …, 2016 - dl.acm.org
Proceedings of the 39th International ACM SIGIR conference on Research and …, 2016dl.acm.org
Contextual bandit algorithms provide principled online learning solutions to find optimal
trade-offs between exploration and exploitation with companion side-information. They have
been extensively used in many important practical scenarios, such as display advertising
and content recommendation. A common practice estimates the unknown bandit parameters
pertaining to each user independently. This unfortunately ignores dependency among users
and thus leads to suboptimal solutions, especially for the applications that have strong social …
Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as display advertising and content recommendation. A common practice estimates the unknown bandit parameters pertaining to each user independently. This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components.
In this paper, we develop a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating. We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms. Extensive experiments on both synthetic and three large-scale real-world datasets verified the improvement of our proposed algorithm against several state-of-the-art contextual bandit algorithms.
ACM Digital Library