short-paper

Open access

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

Authors:

Sofia Ira Ktena,

Pranay Kumar Myana,

Alykhan Tejani,

Michael Kneier,

Sourav DasAuthors Info & Claims

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

Pages 456 - 461

https://doi.org/10.1145/3383313.3412214

Published: 22 September 2020 Publication History

All formats PDF

Abstract

Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to address this limitation by providing new information about the environment, which encompasses user preference, and can lead to higher long-term reward. In this work, we formulate a display advertising recommender as a contextual bandit and implement exploration techniques that require sampling from the posterior distribution of click-through-rates in a computationally tractable manner. Traditional large-scale deep learning models do not provide uncertainty estimates by default. We approximate these uncertainty measurements of the predictions by employing a bootstrapped model with multiple heads and dropout units. We benchmark a number of different models in an offline simulation environment using a publicly available dataset of user-ads engagements. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting with large-scale production traffic, where we demonstrate a positive gain of our exploration model.

References

[1]

Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning. 127–135.

Digital Library

[2]

Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, Nov (2002), 397–422.

Digital Library

[3]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256.

Digital Library

[4]

Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research 14, 1 (2013), 3207–3260.

Digital Library

[5]

Allison JB Chaney, Brandon M Stewart, and Barbara E Engelhardt. 2018. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In Proceedings of the 12th ACM Conference on Recommender Systems. 224–232.

Digital Library

[6]

Olivier Chapelle. 2014. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1097–1105.

Digital Library

[7]

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.

Digital Library

[8]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.

Digital Library

[9]

Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 208–214.

[10]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.

Digital Library

[11]

Terry Dielman, Cynthia Lowry, and Roger Pfaffenberger. 1994. A comparison of quantile estimators. Communications in Statistics-Simulation and Computation 23, 2(1994), 355–371.

[12]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059.

Digital Library

[13]

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline a/b testing for recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 198–206.

Digital Library

[14]

Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszar, Steven Yoo, and Wenzhe Shi. 2019. Addressing delayed feedback for continuous training with neural networks in CTR prediction. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 187–195.

Digital Library

[15]

Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4–22.

[16]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems. 6402–6413.

[17]

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.

Digital Library

[18]

Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. 297–306.

Digital Library

[19]

Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027(2018).

[20]

James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. 2018. Explore, exploit, and explain: personalizing explainable recommendations with bandits. In Proceedings of the 12th ACM Conference on Recommender Systems. 31–39.

Digital Library

[21]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.

[22]

RM Neal. 1995. Bayesian learning for neural networks [PhD thesis]. Toronto, Ontario, Canada: Department of Computer Science, University of Toronto (1995).

[23]

Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. 2016. Deep exploration via bootstrapped DQN. In Advances in neural information processing systems. 4026–4034.

[24]

Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning. 784–791.

Digital Library

[25]

Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling. International Conference on Learning Representations, ICLR.

[26]

Giorgio Roffo and Alessandro Vinciarelli. 2016. Personality in computational advertising: A benchmark. In 4 th Workshop on Emotions and Personality in Personalized Systems (EMPIRE).

[27]

Jasper Snoek, Yaniv Ovadia, Emily Fertig, Balaji Lakshminarayanan, Sebastian Nowozin, D Sculley, Joshua Dillon, Jie Ren, and Zachary Nado. 2019. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems. 13969–13980.

[28]

Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. 2015. Scalable bayesian optimization using deep neural networks. In International conference on machine learning. 2171–2180.

Digital Library

[29]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.

Digital Library

[30]

William R Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3/4 (1933), 285–294.

[31]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.

Digital Library

[32]

Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1 (2014), 1–22.

Digital Library

[33]

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.

Digital Library

[34]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems. 95–103.

Digital Library

[35]

Li Zhou and Emma Brunskill. 2016. Latent contextual bandits and their application to personalized recommendations for new users. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 3646–3653.

Cited By

Lin B(2024)Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlookExpert Systems with Applications10.1016/j.eswa.2023.122254238(122254)Online publication date: Mar-2024
https://doi.org/10.1016/j.eswa.2023.122254
Singh NSingh S(2024)A systematic literature review of solutions for cold start problemInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02359-y15:7(2818-2852)Online publication date: 14-May-2024
https://doi.org/10.1007/s13198-024-02359-y
Cai RLu RChen WHao Z(2024)Counterfactual contextual bandit for recommendation under delayed feedbackNeural Computing and Applications10.1007/s00521-024-09800-0Online publication date: 9-May-2024
https://doi.org/10.1007/s00521-024-09800-0
Show More Cited By

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

A contextual-bandit approach to personalized news article recommendation
WWW '10: Proceedings of the 19th international conference on World wide web

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two ...
Personalized Recommendation via Parameter-Free Contextual Bandits
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Personalized recommendation services have gained increasing popularity and attention in recent years as most useful information can be accessed online in real-time. Most online recommender systems try to address the information needs of users by virtue ...
A Hybrid Multi-criteria Semantic-Enhanced Collaborative Filtering Approach for Personalized Recommendations
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Recommender systems aim to assist web users to find only relevant information to their needs rather than an undifferentiated mass of information. Collaborative filtering (CF) techniques are probably the most popular and widely adopted techniques in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

September 2020

796 pages

ISBN:9781450375832

DOI:10.1145/3383313

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

RecSys '20

Sponsor:

RecSys '20: Fourteenth ACM Conference on Recommender Systems

September 22 - 26, 2020

Virtual Event, Brazil

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Upcoming Conference

RecSys '24

Sponsor:
sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
3,126
Total Downloads

Downloads (Last 12 months)541
Downloads (Last 6 weeks)76

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin B(2024)Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlookExpert Systems with Applications10.1016/j.eswa.2023.122254238(122254)Online publication date: Mar-2024
https://doi.org/10.1016/j.eswa.2023.122254
Singh NSingh S(2024)A systematic literature review of solutions for cold start problemInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02359-y15:7(2818-2852)Online publication date: 14-May-2024
https://doi.org/10.1007/s13198-024-02359-y
Cai RLu RChen WHao Z(2024)Counterfactual contextual bandit for recommendation under delayed feedbackNeural Computing and Applications10.1007/s00521-024-09800-0Online publication date: 9-May-2024
https://doi.org/10.1007/s00521-024-09800-0
Tsai YLin S(2024)Handling Concept Drift in Non-stationary Bandit Through Predicting Future RewardsTrends and Applications in Knowledge Discovery and Data Mining10.1007/978-981-97-2650-9_13(161-173)Online publication date: 28-Apr-2024
https://doi.org/10.1007/978-981-97-2650-9_13
Huang ZLam HMeisami AZhang HOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Optimal regret is achievable with bounded approximate inference errorProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666828(16055-16082)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666828
Sivamayil KRajasekar EAljafari BNikolovski SVairavasundaram SVairavasundaram I(2023)A Systematic Study on Reinforcement Learning Based ApplicationsEnergies10.3390/en1603151216:3(1512)Online publication date: 3-Feb-2023
https://doi.org/10.3390/en16031512
Wang HXu HLi CLiu ZWang H(2023)Incentivizing Exploration in Linear Contextual Bandits under Information GapProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608794(415-425)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608794
Zhu ZVan Roy BFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Scalable Neural Contextual Bandit for Recommender SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615048(3636-3646)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615048
Jeunen OGoethals B(2023)Pessimistic Decision-Making for Recommender SystemsACM Transactions on Recommender Systems10.1145/35680291:1(1-27)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1145/3568029
Kar IMukhopadhyay SChatterjee A(2023)Application of Deep Reinforcement Learning to an extreme contextual and collective rare event with multivariate unsupervised reward generator for detecting leaks in large-scale water pipelines2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT)10.1109/ICECCT56650.2023.10179850(1-7)Online publication date: 22-Feb-2023
https://doi.org/10.1109/ICECCT56650.2023.10179850
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents