Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3340531.3412721acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Learning to Infer User Hidden States for Online Sequential Advertising

Published: 19 October 2020 Publication History
  • Get Citation Alerts
  • Abstract

    To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy.In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method's superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.

    Supplementary Material

    MP4 File (3340531.3412721.mp4)
    To drive purchase in online advertising, it is of the advertiser?s great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy. In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. However, a consumer?s purchase intent is unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method?s superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.

    References

    [1]
    Vibhanshu Abhishek, Peter Fader, and Kartik Hosanagar. 2012. Media exposure through the funnel: A model of multi-stage attribution. Available at SSRN 2158421 (2012).
    [2]
    Robin Allesiardo, Raphaël Féraud, and Djallel Bouneffouf. 2014. A neural networks committee for the contextual bandit problem. In International Conference on Neural Information Processing. Springer, 374--381.
    [3]
    Craig Boutilier and David Poole. 1996. Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the National Conference on Artificial Intelligence. Citeseer, 1168--1175.
    [4]
    Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 661--670.
    [5]
    Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1187--1196.
    [6]
    Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of operations research, Vol. 134, 1 (2005), 19--67.
    [7]
    Jun Feng, Heng Li, Minlie Huang, Shichen Liu, Wenwu Ou, Zhirong Wang, and Xiaoyan Zhu. 2018. Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1939--1948.
    [8]
    Anindya Ghose and Vilma Todri. 2015. Towards a digital attribution model: Measuring the impact of display advertising on online consumer behavior. Available at SSRN 2672090 (2015).
    [9]
    Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. arXiv preprint arXiv:1803.00710 (2018).
    [10]
    Eugene Ie, Vihan Jain, Jing Wang, Sanmit Navrekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, et almbox. 2019. Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv preprint arXiv:1905.12767 (2019).
    [11]
    Bernard J Jansen and Simone Schuster. 2011. Bidding on the buying funnel for sponsored search and keyword advertising. Journal of Electronic Commerce Research, Vol. 12, 1 (2011), 1.
    [12]
    Wendi Ji and Xiaoling Wang. 2017. Additional Multi-Touch Attribution for Online Advertising. In AAAI. 1360--1366.
    [13]
    Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising. arXiv preprint arXiv:1802.09756 (2018).
    [14]
    Peter Karkus, David Hsu, and Wee Sun Lee. 2017. Qmdp-net: Deep learning for planning under partial observability. In Advances in Neural Information Processing Systems. 4694--4704.
    [15]
    Sven Koenig and Reid G Simmons. 1996. Unsupervised learning of probabilistic models for robot navigation. In Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on, Vol. 3. IEEE, 2301--2308.
    [16]
    M Mahmud. 2010. Constructing states for reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 727--734.
    [17]
    Andrew Kachites McCallum and Dana Ballard. 1996. Reinforcement learning with selective perception and hidden state. Ph.D. Dissertation. University of Rochester. Dept. of Computer Science.
    [18]
    R Andrew McCallum. 1993. Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning. 190--196.
    [19]
    Rahul Meshram, Aditya Gopalan, and D Manjunath. 2016. Optimal recommendation to users that react: Online learning for a class of POMDPs. In Decision and Control (CDC), 2016 IEEE 55th Conference on. IEEE, 7210--7215.
    [20]
    et al. Mnih, Volodymyr. 2015. Human-level control through deep reinforcement learning. Nature 518, no. 7540 (2015): 529 (2015).
    [21]
    Kevin P Murphy. 2000. A survey of POMDP solution techniques. environment, Vol. 2 (2000), X3.
    [22]
    Steven Noble. 2010. It's time to bury the marketing funnel. URL: http://www. forrester. com/rb/Research/time_to_bury_marketing_funnel/q/id/57495, Vol. 2 (2010).
    [23]
    Ronald Parr and Stuart Russell. 1995. Approximating optimal policies for partially observable stochastic domains. In IJCAI, Vol. 95. 1088--1094.
    [24]
    Andres C Rodriguez, Ronald Parr, and Daphne Koller. 2000. Reinforcement learning using approximate belief states. In Advances in Neural Information Processing Systems. 1036--1042.
    [25]
    Stephane Ross, Brahim Chaib-draa, and Joelle Pineau. 2008. Bayes-adaptive pomdps. In Advances in neural information processing systems. 1225--1232.
    [26]
    Nicholas Roy, Geoffrey Gordon, and Sebastian Thrun. 2005. Finding approximate POMDP solutions through belief compression. Journal of artificial intelligence research, Vol. 23 (2005), 1--40.
    [27]
    Xuhui Shao and Lexin Li. 2011. Data-driven multi-touch attribution models. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 258--264.
    [28]
    Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2018. Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning. arXiv preprint arXiv:1805.10000 (2018).
    [29]
    Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association.
    [30]
    Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. 2018. Budget constrained bidding by model-free reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1443--1451.
    [31]
    Shuai Yuan and Jun Wang. 2012. Sequential selection of correlated ads by POMDPs. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 515--524.
    [32]
    Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016. Deepintent: Learning attentions for online advertising with recurrent neural networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1295--1304.
    [33]
    Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018b. Learning Tree-based Deep Model for Recommender Systems. arXiv preprint arXiv:1801.02294 (2018).
    [34]
    Pengfei Zhu, Xin Li, Pascal Poupart, and Guanghui Miao. 2018a. On improving deep reinforcement learning for pomdps. arXiv preprint arXiv:1804.06309.

    Cited By

    View all
    • (2023)Online restless bandits with unobserved statesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619021(15041-15066)Online publication date: 23-Jul-2023
    • (2022)KRAF: A Flexible Advertising Framework using Knowledge Graph-Enriched Multi-Agent Reinforcement LearningProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557373(47-56)Online publication date: 17-Oct-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
    October 2020
    3619 pages
    ISBN:9781450368599
    DOI:10.1145/3340531
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. online advertising
    2. partially observable markov decision process

    Qualifiers

    • Research-article

    Conference

    CIKM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Online restless bandits with unobserved statesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619021(15041-15066)Online publication date: 23-Jul-2023
    • (2022)KRAF: A Flexible Advertising Framework using Knowledge Graph-Enriched Multi-Agent Reinforcement LearningProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557373(47-56)Online publication date: 17-Oct-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media