Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3442381.3449987acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

GuideBoot: Guided Bootstrap for Deep Contextual Banditsin Online Advertising

Published: 03 June 2021 Publication History

Abstract

The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration via uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to complex problems by using deep reward models, but lack a clear guidance to the exploration behavior. It still remains largely unsolved to develop a practical method for complex deep contextual bandits.
In this paper, we introduce Guided Bootstrap (GuideBoot), combining the best of both worlds. GuideBoot provides explicit guidance to the exploration behavior by training multiple models over both real samples and noisy samples with fake labels, where the noise is added according to the predictive uncertainty. The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling. Moreover, we extend it to an online version that can learn solely from streaming data, which is favored in real applications. Extensive experiments on both synthetic tasks and large-scale advertising environments show that GuideBoot achieves significant improvements against previous state-of-the-art methods.

References

[1]
Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In ICML. 127–135.
[2]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256.
[3]
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight Uncertainty in Neural Network. In International Conference on Machine Learning. 1613–1622.
[4]
Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249–2257.
[5]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
[6]
Dean Eckles and Maurits Kaptein. 2014. Thompson sampling with the online bootstrap. arXiv preprint arXiv:1410.4009(2014).
[7]
Adam N Elmachtoub, Ryan McNellis, Sechan Oh, and Marek Petrik. 2017. A Practical Method for Solving Contextual Bandit Problems Using Decision Trees. In UAI.
[8]
Sarah Filippi, Olivier Cappe, Aurélien Garivier, and Csaba Szepesvári. 2010. Parametric bandits: The generalized linear case. In NIPS. 586–594.
[9]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059.
[10]
Yarin Gal, Jiri Hron, and Alex Kendall. 2017. Concrete dropout. In Advances in neural information processing systems. 3581–3590.
[11]
Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J Rezende, SM Eslami, and Yee Whye Teh. 2018. Neural processes. arXiv preprint arXiv:1807.01622(2018).
[12]
Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen, and Scott Fujimoto. 2018. Horizon: Facebook’s open source applied reinforcement learning platform. arXiv preprint arXiv:1811.00260(2018).
[13]
Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, and Guang Cheng. 2019. Bootstrapping Upper Confidence Bound. In NeurIPS. 12123–12133.
[14]
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.
[15]
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, 2014. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. 1–9.
[16]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[17]
Durk P Kingma, Tim Salimans, and Max Welling. 2015. Variational dropout and the local reparameterization trick. In Advances in neural information processing systems. 2575–2583.
[18]
Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar, and Michael I Jordan. 2014. A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76, 4 (2014), 795–816.
[19]
Andreas Krause and Cheng S Ong. 2011. Contextual gaussian process bandit optimization. In Advances in Neural Information Processing Systems. 2447–2455.
[20]
Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, and Mohammad Ghavamzadeh. 2019. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits. In ICML. 3601–3610.
[21]
John Langford and Tong Zhang. 2008. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS. 817–824.
[22]
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In WWW. 661–670.
[23]
H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, 2013. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1222–1230.
[24]
Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. 2016. Deep exploration via bootstrapped DQN. In Advances in neural information processing systems. 4026–4034.
[25]
Ian Osband and Benjamin Van Roy. 2015. Bootstrapped thompson sampling and deep exploration. arXiv preprint arXiv:1507.00300(2015).
[26]
Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, and Qing He. 2019. Policy Gradients for Contextual Recommendations. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). ACM, New York, NY, USA, 1421–1431.
[27]
Feiyang Pan, Jia He, Dandan Tu, and Qing He. 2020. Trust the Model When It Is Confident: Masked Model-based Actor-Critic. In Advances in neural information processing systems.
[28]
Feiyang Pan, Shuokai Li, Xiang Ao, Pingzhong Tang, and Qing He. 2019. Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 695–704.
[29]
Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127(2018).
[30]
Parikshit Shah, Ming Yang, Sachidanand Alle, Adwait Ratnaparkhi, Ben Shahshahani, and Rohit Chandra. 2017. A practical exploration system for search advertising. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1625–1631.
[31]
Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, and Tao Li. 2015. Personalized recommendation via parameter-free contextual bandits. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 323–332.
[32]
Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, and Yasin Abbasi-Yadkori. 2018. New insights into bootstrapping for bandits. arXiv preprint arXiv:1805.09793(2018).
[33]
Chi-Hua Wang, Yang Yu, Botao Hao, and Guang Cheng. 2020. Residual Bootstrap Exploration for Bandit Algorithms. arXiv preprint arXiv:2002.08436(2020).
  1. GuideBoot: Guided Bootstrap for Deep Contextual Banditsin Online Advertising

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '21: Proceedings of the Web Conference 2021
    April 2021
    4054 pages
    ISBN:9781450383127
    DOI:10.1145/3442381
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bootstrap
    2. Deep Contextual Bandits
    3. Online Advertising

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '21
    Sponsor:
    WWW '21: The Web Conference 2021
    April 19 - 23, 2021
    Ljubljana, Slovenia

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 528
      Total Downloads
    • Downloads (Last 12 months)144
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media