research-article

Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

Authors:

Guannan ZhangAuthors Info & Claims

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Pages 186 - 194

https://doi.org/10.1145/3539597.3570486

Published: 27 February 2023 Publication History

Abstract

We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data. We first discuss the long-term effect of optimizing marketing budget allocation decisions in the offline setting. To overcome the challenge, we propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies. The proposed method reduces the need to store infinitely many policies in previous methods to only constantly many policies, which achieves nearly optimal policy efficiency, making it practical and favorable for industrial usage. We further show that this method is guaranteed to converge to the optimal policy, which cannot be achieved by previous value-based reinforcement learning methods for marketing budget allocation. Our experiments on a large-scale marketing campaign with tens-of-millions users and more than one billion budget verify the theoretical results and show that the proposed method outperforms various baseline methods. The proposed method has been successfully deployed to serve all the traffic of this marketing campaign.

Supplementary Material

MP4 File (24_wsdm2023_cai_marketing_budget_01.mp4-streaming.mp4)

Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

Download
688.38 MB

References

[1]

Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. 1.

Digital Library

[2]

Jacob Abernethy, Peter L Bartlett, and Elad Hazan. 2011. Blackwell approachability and no-regret learning are equivalent. In Proceedings of the 24th Annual Conference on Learning Theory. 27--46.

[3]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 22--31.

Digital Library

[4]

M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys (CSUR) (2021).

[5]

Deepak Agarwal, Souvik Ghosh, Kai Wei, and Siyu You. 2014. Budget pacing for targeted online advertisements at linkedin. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1613--1619.

Digital Library

[6]

Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.

[7]

Steven Bohez, Abbas Abdolmaleki, Michael Neunert, Jonas Buchli, Nicolas Heess, and Raia Hadsell. 2019. Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623 (2019).

[8]

Xuanying Chen, Zhining Liu, Li Yu, Sen Li, Lihong Gu, Xiaodong Zeng, Yize Tan, and Jinjie Gu. 2021b. Adversarial Learning for Incentive Optimization in Mobile Payment Marketing. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2940--2944.

Digital Library

[9]

Xiaocong Chen, Lina Yao, Julian McAuley, Guanglin Zhou, and Xianzhi Wang. 2021c. A survey of deep reinforcement learning in recommender systems: A systematic review and future directions. arXiv preprint arXiv:2109.03540 (2021).

[10]

Yi Chen, Jing Dong, and Zhaoran Wang. 2021a. A primal-dual approach to constrained Markov decision processes. arXiv preprint arXiv:2101.10895 (2021).

[11]

Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2017. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 6070--6120.

Digital Library

[12]

Yoav Freund and Robert E Schapire. 1999. Adaptive game playing using multiplicative weights. Games and Economic Behavior, Vol. 29, 1--2 (1999), 79--103.

[13]

Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning. PMLR, 2052--2062.

[14]

Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018).

[15]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).

[16]

Elad Hazan, Sham Kakade, Karan Singh, and Abby Van Soest. 2019. Provably efficient maximum entropy exploration. In International Conference on Machine Learning. PMLR, 2681--2691.

[17]

Abdel Labbi and Cesar Berrospi. 2007. Optimizing marketing planning and budgeting using Markov decision processes: An airline case study. IBM Journal of Research and Development, Vol. 51, 3.4 (2007), 421--431.

Digital Library

[18]

Hoang Le, Cameron Voloshin, and Yisong Yue. 2019. Batch Policy Learning under Constraints. In International Conference on Machine Learning. 3703--3712.

[19]

Duanshun Li, Jing Liu, Jinsung Jeon, Seoyoung Hong, Thai Le, Dongwon Lee, and Noseong Park. 2021. Large-Scale Data-Driven Airline Market Influence Maximization. arXiv preprint arXiv:2105.15012 (2021).

[20]

Qingkai Liang, Fanyu Que, and Eytan Modiano. 2018. Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv preprint arXiv:1802.06480 (2018).

[21]

Feng Liu, Huifeng Guo, Xutao Li, Ruiming Tang, Yunming Ye, and Xiuqiang He. 2020. End-to-end deep reinforcement learning based recommendation with supervised embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining. 384--392.

Digital Library

[22]

Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018).

[23]

Ziqi Liu, Dong Wang, Qianyu Yu, Zhiqiang Zhang, Yue Shen, Jian Ma, Wenliang Zhong, Jinjie Gu, Jun Zhou, Shuang Yang, et al. 2019. Graph representation learning for merchant incentive optimization in mobile payment marketing. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2577--2584.

Digital Library

[24]

Sobhan Miryoosefi, Kianté Brantley, Hal Daume III, Miro Dudik, and Robert E Schapire. 2019. Reinforcement learning with convex constraints. In Advances in Neural Information Processing Systems. 14093--14102.

[25]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[26]

Santiago Paternain, Luiz Chamon, Miguel Calvo-Fullana, and Alejandro Ribeiro. 2019. Constrained reinforcement learning has zero duality gap. In Advances in Neural Information Processing Systems. 7555--7565.

[27]

Shai Shalev-Shwartz et al. 2011. Online learning and online convex optimization. Foundations and trends in Machine Learning, Vol. 4, 2 (2011), 107--194.

[28]

Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2019. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902--4909.

Digital Library

[29]

Adam Stooke, Joshua Achiam, and Pieter Abbeel. 2020. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning. PMLR, 9133--9143.

[30]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[31]

Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward Constrained Policy Optimization. In International Conference on Learning Representations.

[32]

Hado Van Hasselt, Arthur Guez, and David Silver. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 (2015).

[33]

Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. 2018. Budget constrained bidding by model-free reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1443--1451.

Digital Library

[34]

Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, and Shuang Yang. 2019. Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 971--980.

Digital Library

[35]

Jian Xu, Kuang-chih Lee, Wentong Li, Hang Qi, and Quan Lu. 2015. Smart pacing for effective online ad campaign optimization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2217--2226.

Digital Library

[36]

Li Yu, Zhengwei Wu, Tianchi Cai, Ziqi Liu, Zhiqiang Zhang, Lihong Gu, Xiaodong Zeng, and Jinjie Gu. 2021. Joint Incentive Optimization of Customer and Merchant in Mobile Payment Marketing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 15000--15007.

[37]

Tom Zahavy, Alon Cohen, Haim Kaplan, and Yishay Mansour. 2020. Apprenticeship learning via frank-wolfe. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6720--6728.

[38]

Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, and Satinder Singh. 2021. Reward is enough for convex MDPs. Advances in Neural Information Processing Systems, Vol. 34 (2021).

[39]

Kui Zhao, Junhao Hua, Ling Yan, Qi Zhang, Huan Xu, and Cheng Yang. 2019. A Unified Framework for Marketing Budget Allocation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1820--1830.

Digital Library

[40]

Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, and Qiang Li. 2015. Stock constrained recommendation in tmall. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2287--2296.

Digital Library

[41]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059--1068.

Digital Library

[42]

Martin Zinkevich. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03). 928--936.

Digital Library

[43]

Lixin Zou, Long Xia, Pan Du, Zhuo Zhang, Ting Bai, Weidong Liu, Jian-Yun Nie, and Dawei Yin. 2020. Pseudo Dyna-Q: A reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining. 816--824.

Digital Library

Cited By

Gu PZhao MHe XCai YAn BLarson K(2024)PoRankProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/447(4044-4052)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/447
Lauw HChua TSi LTerzi ETsaparas PTomkins A(2023)Report on the 16th ACM International Conference on Web Search and Data Mining (WSDM 2023)ACM SIGIR Forum10.1145/3636341.363635257:1(1-5)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3636341.3636352
Cai TBao SJiang JZhou SZhang WGu LGu JZhang GChen HDuh WHuang HKato MMothe JPoblete B(2023)Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592022(2179-2183)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592022

Index Terms

Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making

Recommendations

Internet Marketing: Learn Proven Methods to Making as an Internet Marketer Today
HiBid: A Cross-Channel Constrained Bidding System With Budget Allocation by Hierarchical Offline Deep Reinforcement Learning
Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks ...
Efficient Offline Reinforcement Learning With Relaxed Conservatism
Offline reinforcement learning (RL) aims at learning an optimal policy from a static offline data set, without interacting with the environment. However, the theoretical understanding of the existing offline RL methods needs further studies, among which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

February 2023

1345 pages

ISBN:9781450394079

DOI:10.1145/3539597

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Hady Lauw
Singapore Management University
,
Program Chairs:
Luo Si
Salesforce
,
Evimaria Terzi
Boston University
,
Panayiotis Tsaparas
University of Ioannina

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM '23

Sponsor:

WSDM '23: The Sixteenth ACM International Conference on Web Search and Data Mining

February 27 - March 3, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
429
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)5

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gu PZhao MHe XCai YAn BLarson K(2024)PoRankProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/447(4044-4052)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/447
Lauw HChua TSi LTerzi ETsaparas PTomkins A(2023)Report on the 16th ACM International Conference on Web Search and Data Mining (WSDM 2023)ACM SIGIR Forum10.1145/3636341.363635257:1(1-5)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3636341.3636352
Cai TBao SJiang JZhou SZhang WGu LGu JZhang GChen HDuh WHuang HKato MMothe JPoblete B(2023)Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592022(2179-2183)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592022

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten