research-article

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed

Authors:

Dong WangAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 401 - 409

https://doi.org/10.1145/3485447.3512109

Published: 25 April 2022 Publication History

Abstract

E-commerce platforms usually display a mixed list of ads and organic items in feed. One key problem is to allocate the limited slots in the feed to maximize the overall revenue as well as improve user experience, which requires a good model for user preference. Instead of modeling the influence of individual items on user behaviors, the arrangement signal models the influence of the arrangement of items and may lead to a better allocation strategy. However, most of previous strategies fail to model such a signal and therefore result in suboptimal performance. In addition, the percentage of ads exposed (PAE) is an important indicator in ads allocation. Excessive PAE hurts user experience while too low PAE reduces platform revenue. Therefore, how to constrain the PAE within a certain range while keeping personalized recommendation under the PAE constraint is a challenge.

In this paper, we propose Cross Deep Q Network (Cross DQN) to extract the crucial arrangement signal by crossing the embeddings of different items and modeling the crossed sequence by multi-channel attention. Besides, we propose an auxiliary loss for batch-level constraint on PAE to tackle the above-mentioned challenge. Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on Meituan feed to serve more than 300 millions of customers.

References

[1]

Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.

[2]

Carlos Carrion, Zenan Wang, Harikesh Nair, Xianghong Luo, Yulin Lei, Xiliang Lin, Wenlong Chen, Qiyu Hu, Changping Peng, Yongjun Bao, and Weipeng P. Yan. 2021. Blending Advertising with Organic Content in E-Commerce: A Virtual Bids Optimization Approach. ArXiv abs/2105.13556(2021).

[3]

Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review 97, 1 (2007), 242–259.

[4]

Jun Feng, H. Li, Minlie Huang, Shichen Liu, Wenwu Ou, Zhirong Wang, and Xiaoyan Zhu. 2018. Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning. Proceedings of the 2018 World Wide Web Conference (2018).

Digital Library

[5]

Yufei Feng, Yu Gong, Fei Sun, Qingwen Liu, and Wenwu Ou. 2021. Revisit Recommender System in the Permutation Prospective. ArXiv abs/2102.12057(2021).

[6]

Yufei Feng, Binbin Hu, Yu Gong, Fei Sun, Qingwen Liu, and Wenwu Ou. 2021. GRN: Generative Rerank Network for Context-wise Recommendation. ArXiv abs/2104.00860(2021).

[7]

A. Ghose and Sha Yang. 2009. An Empirical Analysis of Search Engine Advertising: Sponsored Search in Electronic Markets. Manag. Sci. 55(2009), 1605–1622.

Digital Library

[8]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[9]

Iordanis Koutsopoulos. 2016. Optimal advertisement allocation in online social media feeds. In Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-scale mObile computing and online Social neTworking. 43–48.

Digital Library

[10]

Xiang Li, Chao Wang, Bin Tong, Jiwei Tan, Xiaoyi Zeng, and Tao Zhuang. 2020. Deep Time-Aware Item Evolution Network for Click-Through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 785–794.

Digital Library

[11]

Aranyak Mehta. 2013. Online Matching and Ad Allocation. Found. Trends Theor. Comput. Sci. 8 (2013), 265–368.

Digital Library

[12]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.

[13]

Wentao Ouyang, Xiuwu Zhang, Lei Zhao, Jinmei Luo, Yu Zhang, Heng Zou, Zhaojie Liu, and Yanlong Du. 2020. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2669–2676.

Digital Library

[14]

Richard S Sutton, Andrew G Barto, 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.

[15]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762(2017).

[16]

B. Wang, Zhaonan Li, Jie Tang, Kuo Zhang, Songcan Chen, and Liyun Ru. 2011. Learning to Advertise: How Many Ads Are Enough?. In PAKDD.

[17]

Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Weinan Zhang, Jun Wang, Xiaotian Hao, Yixi Wang, Han Li, 2019. Learning Adaptive Display Exposure for Real-Time Advertising. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2595–2603.

Digital Library

[18]

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.

[19]

Jianxiong Wei, Anxiang Zeng, Yueqiu Wu, Pengxin Guo, Q. Hua, and Qingpeng Cai. 2020. Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce. ArXiv abs/2005.12206(2020).

[20]

Ruobing Xie, Shaoliang Zhang, Rui Wang, Feng Xia, and Leyu Lin. 2021. Hierarchical Reinforcement Learning for Integrated Recommendation. In Proceedings of AAAI.

[21]

Jinyun Yan, Zhiyuan Xu, Birjodh Tiwana, and Shaunak Chatterjee. 2020. Ads Allocation in Feed via Constrained Optimization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3386–3394.

Digital Library

[22]

Weiru Zhang, Chao Wei, Xiaonan Meng, Yi Hu, and Hao Wang. 2018. The whole-page optimization via dynamic ad allocation. In Companion Proceedings of the The Web Conference. 1407–1411.

Digital Library

[23]

Mengchen Zhao, Z. Li, Bo An, Haifeng Lu, Yifan Yang, and Chen Chu. 2018. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty. In IJCAI.

[24]

Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang, Xiwang Yang, Xiaobing Liu, Hui Liu, and Jiliang Tang. 2021. DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 750–758.

[25]

Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, and Jiliang Tang. 2020. Jointly learning to recommend and advertise. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3319–3327.

Digital Library

[26]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059–1068.

Digital Library

Cited By

Xi YLiu WDai XTang RLiu QZhang WYu Y(2024)Utility-oriented Reranking with Counterfactual ContextACM Transactions on Knowledge Discovery from Data10.1145/3671004Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3671004
Li YZhang YZhou ZLi QHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)DeCoCDR: Deployable Cloud-Device Collaboration for Cross-Domain RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657786(2114-2123)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657786
Zhang ZZhang QWu XShi XLiao GWang YWang XZhao DChua TNgo CKumar RLauw HKa-Wei Lee R(2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648310
Show More Cited By

Index Terms

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Page-Level Interest Network in Reinforcement Learning for Ads Allocation
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

A mixed list of ads and organic items is usually displayed in feed and how to allocate the limited slots to maximize the overall revenue is a key problem. Meanwhile, user behavior modeling is essential in recommendation and advertising (e.g., CTR ...
Cross-representation mediation of user models

Personalization is considered a powerful methodology for improving the effectiveness of information search and decision making. It has led to the dissemination of systems capable of suggesting relevant and personalized information (or items) to the users,...
A deep framework for cross-domain and cross-system recommendations
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence

Cross-Domain Recommendation (CDR) and Cross-System Recommendations (CSR) are two of the promising solutions to address the long-standing data sparsity problem in recommender systems. They leverage the relatively richer information, e.g., ratings, from ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
413
Total Downloads

Downloads (Last 12 months)105
Downloads (Last 6 weeks)11

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xi YLiu WDai XTang RLiu QZhang WYu Y(2024)Utility-oriented Reranking with Counterfactual ContextACM Transactions on Knowledge Discovery from Data10.1145/3671004Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3671004
Li YZhang YZhou ZLi QHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)DeCoCDR: Deployable Cloud-Device Collaboration for Cross-Domain RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657786(2114-2123)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657786
Zhang ZZhang QWu XShi XLiao GWang YWang XZhao DChua TNgo CKumar RLauw HKa-Wei Lee R(2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648310
Li NMa YZhao YWang QZhang ZYu CXu JZheng BDeng XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Ad vs Organic: Revisiting Incentive Compatible Mechanism Design in E-commerce PlatformsProceedings of the ACM on Web Conference 202410.1145/3589334.3645638(235-244)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645638
Bagga VSugunan SSrivastava AKumar RGupta PKumar DGuha D(2024)Adaptive Fusion and Transfer Learning for Enhanced E –Commerce RecommendationsProcedia Computer Science10.1016/j.procs.2023.12.037229:C(345-356)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.12.037
Shi XYang FWang ZWu XGuan MLiao GYongkang WWang XWang DSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerceProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599886(4823-4831)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599886
Xi YLiu WWang YTang RZhang WZhu YZhang RYu YSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)On-device Integrated Re-ranking with Heterogeneous Behavior ModelingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599878(5225-5236)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599878
Xu YShen QYin JDeng ZWang DChen HLai LZhuang TGe JSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Multi-channel Integrated Recommendation with Exposure ConstraintsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599868(5338-5349)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599868
Zhou JMao SYang GTang BXie QLin LWang XWang D(2023)RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender SystemsProceedings of the ACM Web Conference 202310.1145/3543507.3583313(3214-3224)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583313
Shi XWang ZCai YWu XYang FLiao GWang YWang XWang DChen HDuh WHuang HKato MMothe JPoblete B(2023)MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel FeedProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592018(2159-2163)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592018
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents