Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3383313.3412214acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
short-paper
Open access

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

Published: 22 September 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to address this limitation by providing new information about the environment, which encompasses user preference, and can lead to higher long-term reward. In this work, we formulate a display advertising recommender as a contextual bandit and implement exploration techniques that require sampling from the posterior distribution of click-through-rates in a computationally tractable manner. Traditional large-scale deep learning models do not provide uncertainty estimates by default. We approximate these uncertainty measurements of the predictions by employing a bootstrapped model with multiple heads and dropout units. We benchmark a number of different models in an offline simulation environment using a publicly available dataset of user-ads engagements. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting with large-scale production traffic, where we demonstrate a positive gain of our exploration model.

    References

    [1]
    Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning. 127–135.
    [2]
    Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, Nov (2002), 397–422.
    [3]
    Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256.
    [4]
    Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research 14, 1 (2013), 3207–3260.
    [5]
    Allison JB Chaney, Brandon M Stewart, and Barbara E Engelhardt. 2018. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In Proceedings of the 12th ACM Conference on Recommender Systems. 224–232.
    [6]
    Olivier Chapelle. 2014. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1097–1105.
    [7]
    Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.
    [8]
    Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.
    [9]
    Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 208–214.
    [10]
    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
    [11]
    Terry Dielman, Cynthia Lowry, and Roger Pfaffenberger. 1994. A comparison of quantile estimators. Communications in Statistics-Simulation and Computation 23, 2(1994), 355–371.
    [12]
    Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059.
    [13]
    Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline a/b testing for recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 198–206.
    [14]
    Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszar, Steven Yoo, and Wenzhe Shi. 2019. Addressing delayed feedback for continuous training with neural networks in CTR prediction. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 187–195.
    [15]
    Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4–22.
    [16]
    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems. 6402–6413.
    [17]
    Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.
    [18]
    Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. 297–306.
    [19]
    Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027(2018).
    [20]
    James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. 2018. Explore, exploit, and explain: personalizing explainable recommendations with bandits. In Proceedings of the 12th ACM Conference on Recommender Systems. 31–39.
    [21]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
    [22]
    RM Neal. 1995. Bayesian learning for neural networks [PhD thesis]. Toronto, Ontario, Canada: Department of Computer Science, University of Toronto (1995).
    [23]
    Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. 2016. Deep exploration via bootstrapped DQN. In Advances in neural information processing systems. 4026–4034.
    [24]
    Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning. 784–791.
    [25]
    Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling. International Conference on Learning Representations, ICLR.
    [26]
    Giorgio Roffo and Alessandro Vinciarelli. 2016. Personality in computational advertising: A benchmark. In 4 th Workshop on Emotions and Personality in Personalized Systems (EMPIRE).
    [27]
    Jasper Snoek, Yaniv Ovadia, Emily Fertig, Balaji Lakshminarayanan, Sebastian Nowozin, D Sculley, Joshua Dillon, Jie Ren, and Zachary Nado. 2019. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems. 13969–13980.
    [28]
    Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. 2015. Scalable bayesian optimization using deep neural networks. In International conference on machine learning. 2171–2180.
    [29]
    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
    [30]
    William R Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3/4 (1933), 285–294.
    [31]
    Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.
    [32]
    Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1 (2014), 1–22.
    [33]
    Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.
    [34]
    Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems. 95–103.
    [35]
    Li Zhou and Emma Brunskill. 2016. Latent contextual bandits and their application to personalized recommendations for new users. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 3646–3653.

    Cited By

    View all
    • (2024)Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlookExpert Systems with Applications10.1016/j.eswa.2023.122254238(122254)Online publication date: Mar-2024
    • (2024)A systematic literature review of solutions for cold start problemInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02359-y15:7(2818-2852)Online publication date: 14-May-2024
    • (2024)Counterfactual contextual bandit for recommendation under delayed feedbackNeural Computing and Applications10.1007/s00521-024-09800-0Online publication date: 9-May-2024
    • Show More Cited By
    1. Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems
      September 2020
      796 pages
      ISBN:9781450375832
      DOI:10.1145/3383313
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 September 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Algorithmic bias
      2. Contextual bandit
      3. Recommender Systems

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Conference

      RecSys '20: Fourteenth ACM Conference on Recommender Systems
      September 22 - 26, 2020
      Virtual Event, Brazil

      Acceptance Rates

      Overall Acceptance Rate 254 of 1,295 submissions, 20%

      Upcoming Conference

      RecSys '24
      18th ACM Conference on Recommender Systems
      October 14 - 18, 2024
      Bari , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)541
      • Downloads (Last 6 weeks)76
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlookExpert Systems with Applications10.1016/j.eswa.2023.122254238(122254)Online publication date: Mar-2024
      • (2024)A systematic literature review of solutions for cold start problemInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02359-y15:7(2818-2852)Online publication date: 14-May-2024
      • (2024)Counterfactual contextual bandit for recommendation under delayed feedbackNeural Computing and Applications10.1007/s00521-024-09800-0Online publication date: 9-May-2024
      • (2024)Handling Concept Drift in Non-stationary Bandit Through Predicting Future RewardsTrends and Applications in Knowledge Discovery and Data Mining10.1007/978-981-97-2650-9_13(161-173)Online publication date: 28-Apr-2024
      • (2023)Optimal regret is achievable with bounded approximate inference errorProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666828(16055-16082)Online publication date: 10-Dec-2023
      • (2023)A Systematic Study on Reinforcement Learning Based ApplicationsEnergies10.3390/en1603151216:3(1512)Online publication date: 3-Feb-2023
      • (2023)Incentivizing Exploration in Linear Contextual Bandits under Information GapProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608794(415-425)Online publication date: 14-Sep-2023
      • (2023)Scalable Neural Contextual Bandit for Recommender SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615048(3636-3646)Online publication date: 21-Oct-2023
      • (2023)Pessimistic Decision-Making for Recommender SystemsACM Transactions on Recommender Systems10.1145/35680291:1(1-27)Online publication date: 7-Feb-2023
      • (2023)Application of Deep Reinforcement Learning to an extreme contextual and collective rare event with multivariate unsupervised reward generator for detecting leaks in large-scale water pipelines2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT)10.1109/ICECCT56650.2023.10179850(1-7)Online publication date: 22-Feb-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media