Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512109acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed

Published: 25 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    E-commerce platforms usually display a mixed list of ads and organic items in feed. One key problem is to allocate the limited slots in the feed to maximize the overall revenue as well as improve user experience, which requires a good model for user preference. Instead of modeling the influence of individual items on user behaviors, the arrangement signal models the influence of the arrangement of items and may lead to a better allocation strategy. However, most of previous strategies fail to model such a signal and therefore result in suboptimal performance. In addition, the percentage of ads exposed (PAE) is an important indicator in ads allocation. Excessive PAE hurts user experience while too low PAE reduces platform revenue. Therefore, how to constrain the PAE within a certain range while keeping personalized recommendation under the PAE constraint is a challenge.
    In this paper, we propose Cross Deep Q Network (Cross DQN) to extract the crucial arrangement signal by crossing the embeddings of different items and modeling the crossed sequence by multi-channel attention. Besides, we propose an auxiliary loss for batch-level constraint on PAE to tackle the above-mentioned challenge. Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on Meituan feed to serve more than 300 millions of customers.

    References

    [1]
    Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.
    [2]
    Carlos Carrion, Zenan Wang, Harikesh Nair, Xianghong Luo, Yulin Lei, Xiliang Lin, Wenlong Chen, Qiyu Hu, Changping Peng, Yongjun Bao, and Weipeng P. Yan. 2021. Blending Advertising with Organic Content in E-Commerce: A Virtual Bids Optimization Approach. ArXiv abs/2105.13556(2021).
    [3]
    Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review 97, 1 (2007), 242–259.
    [4]
    Jun Feng, H. Li, Minlie Huang, Shichen Liu, Wenwu Ou, Zhirong Wang, and Xiaoyan Zhu. 2018. Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning. Proceedings of the 2018 World Wide Web Conference (2018).
    [5]
    Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, and Wenwu Ou. 2021. Revisit Recommender System in the Permutation Prospective. ArXiv abs/2102.12057(2021).
    [6]
    Yufei Feng, Binbin Hu, Yu Gong, Fei Sun, Qingwen Liu, and Wenwu Ou. 2021. GRN: Generative Rerank Network for Context-wise Recommendation. ArXiv abs/2104.00860(2021).
    [7]
    A. Ghose and Sha Yang. 2009. An Empirical Analysis of Search Engine Advertising: Sponsored Search in Electronic Markets. Manag. Sci. 55(2009), 1605–1622.
    [8]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
    [9]
    Iordanis Koutsopoulos. 2016. Optimal advertisement allocation in online social media feeds. In Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-scale mObile computing and online Social neTworking. 43–48.
    [10]
    Xiang Li, Chao Wang, Bin Tong, Jiwei Tan, Xiaoyi Zeng, and Tao Zhuang. 2020. Deep Time-Aware Item Evolution Network for Click-Through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 785–794.
    [11]
    Aranyak Mehta. 2013. Online Matching and Ad Allocation. Found. Trends Theor. Comput. Sci. 8 (2013), 265–368.
    [12]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
    [13]
    Wentao Ouyang, Xiuwu Zhang, Lei Zhao, Jinmei Luo, Yu Zhang, Heng Zou, Zhaojie Liu, and Yanlong Du. 2020. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2669–2676.
    [14]
    Richard S Sutton, Andrew G Barto, 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.
    [15]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762(2017).
    [16]
    B. Wang, Zhaonan Li, Jie Tang, Kuo Zhang, Songcan Chen, and Liyun Ru. 2011. Learning to Advertise: How Many Ads Are Enough?. In PAKDD.
    [17]
    Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Weinan Zhang, Jun Wang, Xiaotian Hao, Yixi Wang, Han Li, 2019. Learning Adaptive Display Exposure for Real-Time Advertising. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2595–2603.
    [18]
    Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.
    [19]
    Jianxiong Wei, Anxiang Zeng, Yueqiu Wu, Pengxin Guo, Q. Hua, and Qingpeng Cai. 2020. Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce. ArXiv abs/2005.12206(2020).
    [20]
    Ruobing Xie, Shaoliang Zhang, Rui Wang, Feng Xia, and Leyu Lin. 2021. Hierarchical Reinforcement Learning for Integrated Recommendation. In Proceedings of AAAI.
    [21]
    Jinyun Yan, Zhiyuan Xu, Birjodh Tiwana, and Shaunak Chatterjee. 2020. Ads Allocation in Feed via Constrained Optimization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3386–3394.
    [22]
    Weiru Zhang, Chao Wei, Xiaonan Meng, Yi Hu, and Hao Wang. 2018. The whole-page optimization via dynamic ad allocation. In Companion Proceedings of the The Web Conference. 1407–1411.
    [23]
    Mengchen Zhao, Z. Li, Bo An, Haifeng Lu, Yifan Yang, and Chen Chu. 2018. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty. In IJCAI.
    [24]
    Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang, Xiwang Yang, Xiaobing Liu, Hui Liu, and Jiliang Tang. 2021. DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 750–758.
    [25]
    Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, and Jiliang Tang. 2020. Jointly learning to recommend and advertise. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3319–3327.
    [26]
    Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059–1068.

    Cited By

    View all
    • (2024)Utility-oriented Reranking with Counterfactual ContextACM Transactions on Knowledge Discovery from Data10.1145/3671004Online publication date: 4-Jun-2024
    • (2024)DeCoCDR: Deployable Cloud-Device Collaboration for Cross-Domain RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657786(2114-2123)Online publication date: 10-Jul-2024
    • (2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '22: Proceedings of the ACM Web Conference 2022
    April 2022
    3764 pages
    ISBN:9781450390965
    DOI:10.1145/3485447
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Adaptive Ads Exposure
    2. Ads Allocation
    3. Arrangement Signal
    4. Deep Reinforcement Learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '22
    Sponsor:
    WWW '22: The ACM Web Conference 2022
    April 25 - 29, 2022
    Virtual Event, Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)105
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Utility-oriented Reranking with Counterfactual ContextACM Transactions on Knowledge Discovery from Data10.1145/3671004Online publication date: 4-Jun-2024
    • (2024)DeCoCDR: Deployable Cloud-Device Collaboration for Cross-Domain RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657786(2114-2123)Online publication date: 10-Jul-2024
    • (2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
    • (2024)Ad vs Organic: Revisiting Incentive Compatible Mechanism Design in E-commerce PlatformsProceedings of the ACM on Web Conference 202410.1145/3589334.3645638(235-244)Online publication date: 13-May-2024
    • (2024)Adaptive Fusion and Transfer Learning for Enhanced E –Commerce RecommendationsProcedia Computer Science10.1016/j.procs.2023.12.037229:C(345-356)Online publication date: 14-Mar-2024
    • (2023)PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerceProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599886(4823-4831)Online publication date: 6-Aug-2023
    • (2023)On-device Integrated Re-ranking with Heterogeneous Behavior ModelingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599878(5225-5236)Online publication date: 6-Aug-2023
    • (2023)Multi-channel Integrated Recommendation with Exposure ConstraintsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599868(5338-5349)Online publication date: 6-Aug-2023
    • (2023)RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender SystemsProceedings of the ACM Web Conference 202310.1145/3543507.3583313(3214-3224)Online publication date: 30-Apr-2023
    • (2023)MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel FeedProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592018(2159-2163)Online publication date: 19-Jul-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media