Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539597.3570486acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

Published: 27 February 2023 Publication History

Abstract

We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data. We first discuss the long-term effect of optimizing marketing budget allocation decisions in the offline setting. To overcome the challenge, we propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies. The proposed method reduces the need to store infinitely many policies in previous methods to only constantly many policies, which achieves nearly optimal policy efficiency, making it practical and favorable for industrial usage. We further show that this method is guaranteed to converge to the optimal policy, which cannot be achieved by previous value-based reinforcement learning methods for marketing budget allocation. Our experiments on a large-scale marketing campaign with tens-of-millions users and more than one billion budget verify the theoretical results and show that the proposed method outperforms various baseline methods. The proposed method has been successfully deployed to serve all the traffic of this marketing campaign.

Supplementary Material

MP4 File (24_wsdm2023_cai_marketing_budget_01.mp4-streaming.mp4)
Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

References

[1]
Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. 1.
[2]
Jacob Abernethy, Peter L Bartlett, and Elad Hazan. 2011. Blackwell approachability and no-regret learning are equivalent. In Proceedings of the 24th Annual Conference on Learning Theory. 27--46.
[3]
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 22--31.
[4]
M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys (CSUR) (2021).
[5]
Deepak Agarwal, Souvik Ghosh, Kai Wei, and Siyu You. 2014. Budget pacing for targeted online advertisements at linkedin. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1613--1619.
[6]
Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.
[7]
Steven Bohez, Abbas Abdolmaleki, Michael Neunert, Jonas Buchli, Nicolas Heess, and Raia Hadsell. 2019. Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623 (2019).
[8]
Xuanying Chen, Zhining Liu, Li Yu, Sen Li, Lihong Gu, Xiaodong Zeng, Yize Tan, and Jinjie Gu. 2021b. Adversarial Learning for Incentive Optimization in Mobile Payment Marketing. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2940--2944.
[9]
Xiaocong Chen, Lina Yao, Julian McAuley, Guanglin Zhou, and Xianzhi Wang. 2021c. A survey of deep reinforcement learning in recommender systems: A systematic review and future directions. arXiv preprint arXiv:2109.03540 (2021).
[10]
Yi Chen, Jing Dong, and Zhaoran Wang. 2021a. A primal-dual approach to constrained Markov decision processes. arXiv preprint arXiv:2101.10895 (2021).
[11]
Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2017. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 6070--6120.
[12]
Yoav Freund and Robert E Schapire. 1999. Adaptive game playing using multiplicative weights. Games and Economic Behavior, Vol. 29, 1--2 (1999), 79--103.
[13]
Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning. PMLR, 2052--2062.
[14]
Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018).
[15]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
[16]
Elad Hazan, Sham Kakade, Karan Singh, and Abby Van Soest. 2019. Provably efficient maximum entropy exploration. In International Conference on Machine Learning. PMLR, 2681--2691.
[17]
Abdel Labbi and Cesar Berrospi. 2007. Optimizing marketing planning and budgeting using Markov decision processes: An airline case study. IBM Journal of Research and Development, Vol. 51, 3.4 (2007), 421--431.
[18]
Hoang Le, Cameron Voloshin, and Yisong Yue. 2019. Batch Policy Learning under Constraints. In International Conference on Machine Learning. 3703--3712.
[19]
Duanshun Li, Jing Liu, Jinsung Jeon, Seoyoung Hong, Thai Le, Dongwon Lee, and Noseong Park. 2021. Large-Scale Data-Driven Airline Market Influence Maximization. arXiv preprint arXiv:2105.15012 (2021).
[20]
Qingkai Liang, Fanyu Que, and Eytan Modiano. 2018. Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv preprint arXiv:1802.06480 (2018).
[21]
Feng Liu, Huifeng Guo, Xutao Li, Ruiming Tang, Yunming Ye, and Xiuqiang He. 2020. End-to-end deep reinforcement learning based recommendation with supervised embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining. 384--392.
[22]
Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018).
[23]
Ziqi Liu, Dong Wang, Qianyu Yu, Zhiqiang Zhang, Yue Shen, Jian Ma, Wenliang Zhong, Jinjie Gu, Jun Zhou, Shuang Yang, et al. 2019. Graph representation learning for merchant incentive optimization in mobile payment marketing. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2577--2584.
[24]
Sobhan Miryoosefi, Kianté Brantley, Hal Daume III, Miro Dudik, and Robert E Schapire. 2019. Reinforcement learning with convex constraints. In Advances in Neural Information Processing Systems. 14093--14102.
[25]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[26]
Santiago Paternain, Luiz Chamon, Miguel Calvo-Fullana, and Alejandro Ribeiro. 2019. Constrained reinforcement learning has zero duality gap. In Advances in Neural Information Processing Systems. 7555--7565.
[27]
Shai Shalev-Shwartz et al. 2011. Online learning and online convex optimization. Foundations and trends in Machine Learning, Vol. 4, 2 (2011), 107--194.
[28]
Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2019. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902--4909.
[29]
Adam Stooke, Joshua Achiam, and Pieter Abbeel. 2020. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning. PMLR, 9133--9143.
[30]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[31]
Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward Constrained Policy Optimization. In International Conference on Learning Representations.
[32]
Hado Van Hasselt, Arthur Guez, and David Silver. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 (2015).
[33]
Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. 2018. Budget constrained bidding by model-free reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1443--1451.
[34]
Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, and Shuang Yang. 2019. Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 971--980.
[35]
Jian Xu, Kuang-chih Lee, Wentong Li, Hang Qi, and Quan Lu. 2015. Smart pacing for effective online ad campaign optimization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2217--2226.
[36]
Li Yu, Zhengwei Wu, Tianchi Cai, Ziqi Liu, Zhiqiang Zhang, Lihong Gu, Xiaodong Zeng, and Jinjie Gu. 2021. Joint Incentive Optimization of Customer and Merchant in Mobile Payment Marketing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 15000--15007.
[37]
Tom Zahavy, Alon Cohen, Haim Kaplan, and Yishay Mansour. 2020. Apprenticeship learning via frank-wolfe. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6720--6728.
[38]
Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, and Satinder Singh. 2021. Reward is enough for convex MDPs. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[39]
Kui Zhao, Junhao Hua, Ling Yan, Qi Zhang, Huan Xu, and Cheng Yang. 2019. A Unified Framework for Marketing Budget Allocation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1820--1830.
[40]
Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, and Qiang Li. 2015. Stock constrained recommendation in tmall. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2287--2296.
[41]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059--1068.
[42]
Martin Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03). 928--936.
[43]
Lixin Zou, Long Xia, Pan Du, Zhuo Zhang, Ting Bai, Weidong Liu, Jian-Yun Nie, and Dawei Yin. 2020. Pseudo Dyna-Q: A reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining. 816--824.

Cited By

View all
  • (2024)PoRankProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/447(4044-4052)Online publication date: 3-Aug-2024
  • (2023)Report on the 16th ACM International Conference on Web Search and Data Mining (WSDM 2023)ACM SIGIR Forum10.1145/3636341.363635257:1(1-5)Online publication date: 4-Dec-2023
  • (2023)Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592022(2179-2183)Online publication date: 19-Jul-2023

Index Terms

  1. Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining
    February 2023
    1345 pages
    ISBN:9781450394079
    DOI:10.1145/3539597
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 February 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. marketing budget allocation
    2. offline constrained deep RL

    Qualifiers

    • Research-article

    Conference

    WSDM '23

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)89
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PoRankProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/447(4044-4052)Online publication date: 3-Aug-2024
    • (2023)Report on the 16th ACM International Conference on Web Search and Data Mining (WSDM 2023)ACM SIGIR Forum10.1145/3636341.363635257:1(1-5)Online publication date: 4-Dec-2023
    • (2023)Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592022(2179-2183)Online publication date: 19-Jul-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media