research-article

Exploration and Regularization of the Latent Action Space in Recommendation

Authors:

Yongfeng ZhangAuthors Info & Claims

WWW '23: Proceedings of the ACM Web Conference 2023

Pages 833 - 844

https://doi.org/10.1145/3543507.3583244

Published: 30 April 2023 Publication History

Abstract

In recommender systems, reinforcement learning solutions have effectively boosted recommendation performance because of their ability to capture long-term user-system interaction. However, the action space of the recommendation policy is a list of items, which could be extremely large with a dynamic candidate item pool. To overcome this challenge, we propose a hyper-actor and critic learning framework where the policy decomposes the item list generation process into a hyper-action inference step and an effect-action selection step. The first step maps the given state space into a vectorized hyper-action space, and the second step selects the item list based on the hyper-action. In order to regulate the discrepancy between the two action spaces, we design an alignment module along with a kernel mapping function for items to ensure inference accuracy and include a supervision module to stabilize the learning process. We build simulated environments on public datasets and empirically show that our framework is superior in recommendation compared to standard RL baselines.

References

[1]

M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2021. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys (CSUR) (2021).

[2]

Xueying Bai, Jian Guan, and Hongning Wang. 2019. A model-based reinforcement learning with adversarial training for online recommendation. Advances in Neural Information Processing Systems 32 (2019).

[3]

Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding, Pinghua Gong, Dong Zheng, and Peng Jiang. 2022. Constrained Reinforcement Learning for Short Video Recommendation. arXiv preprint arXiv:2205.13248 (2022).

[4]

Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, and Philip Thomas. 2019. Learning action representations for reinforcement learning. In International conference on machine learning. PMLR, 941–950.

[5]

Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential recommendation with graph neural networks. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 378–387.

Digital Library

[6]

Haokun Chen, Xinyi Dai, Han Cai, Weinan Zhang, Xuejian Wang, Ruiming Tang, Yuzhou Zhang, and Yong Yu. 2019. Large-scale interactive recommendation with tree-structured policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3312–3320.

Digital Library

[7]

Hanxiong Chen, Shaoyun Shi, Yunqi Li, and Yongfeng Zhang. 2021. Neural collaborative reasoning. In Proceedings of the Web Conference 2021. 1516–1527.

Digital Library

[8]

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.

Digital Library

[9]

Minmin Chen, Bo Chang, Can Xu, and Ed H Chi. 2021. User response models to improve a reinforce recommender system. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 121–129.

Digital Library

[10]

Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In Proceedings of the eleventh ACM international conference on web search and data mining. 108–116.

Digital Library

[11]

Xiaocong Chen, Lina Yao, Aixin Sun, Xianzhi Wang, Xiwei Xu, and Liming Zhu. 2021. Generative inverse deep reinforcement learning for online recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 201–210.

Digital Library

[12]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.

Digital Library

[13]

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).

[14]

Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems 34 (2021), 20132–20145.

[15]

Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.

[16]

Chongming Gao, Shijun Li, Yuan Zhang, Jiawei Chen, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. 2022. KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (Atlanta, GA, USA) (CIKM ’22). 5 pages. https://doi.org/10.1145/3511808.3557624

Digital Library

[17]

Yingqiang Ge, Shuchang Liu, Ruoyuan Gao, Yikun Xian, Yunqi Li, Xiangyu Zhao, Changhua Pei, Fei Sun, Junfeng Ge, Wenwu Ou, and Yongfeng Zhang. 2021. Towards Long-term Fairness in Recommendation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 445–453.

Digital Library

[18]

Yingqiang Ge, Xiaoting Zhao, Lucia Yu, Saurabh Paul, Diane Hu, Chu-Cheng Hsieh, and Yongfeng Zhang. 2022. Toward Pareto efficient fairness-utility trade-off in recommendation through reinforcement learning. In Proceedings of the fifteenth ACM international conference on web search and data mining. 316–324.

Digital Library

[19]

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems. 299–315.

Digital Library

[20]

F Maxwell Harper. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5 4 (2015) 1–19. F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5 4 (2015) 1–19.

[21]

Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. (2019).

[22]

Eugene Ie, Chih wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. (2019). arxiv:1909.04847 [cs.LG]

[23]

Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the eleventh ACM conference on recommender systems. 306–310.

Digital Library

[24]

Jianchao Ji, Zelong Li, Shuyuan Xu, Max Xiong, Juntao Tan, Yingqiang Ge, Hao Wang, and Yongfeng Zhang. 2023. Counterfactual Collaborative Reasoning. In Proceedings of the 16th ACM International Conference on Web Search and Data Mining.

Digital Library

[25]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.

[26]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.

Digital Library

[27]

Yehuda Koren, Steffen Rendle, and Robert Bell. 2022. Advances in collaborative filtering. Recommender systems handbook (2022), 91–142.

[28]

Zelong Li, Jianchao Ji, Yingqiang Ge, and Yongfeng Zhang. 2022. AutoLossGen: Automatic Loss Function Generation for Recommender Systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1304–1315.

Digital Library

[29]

Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference. 689–698.

Digital Library

[30]

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In ICLR (Poster). http://arxiv.org/abs/1509.02971

[31]

Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, Yuzhou Zhang, and Xiuqiang He. 2020. State representation modeling for deep reinforcement learning based recommendation. Knowledge-Based Systems 205 (2020), 106170.

[32]

Tie-Yan Liu 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225–331.

Digital Library

[33]

Tariq Mahmood and Francesco Ricci. 2007. Learning and adaptivity in interactive recommender systems. In Proceedings of the ninth international conference on Electronic commerce. 75–84.

Digital Library

[34]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.

[35]

Changhua Pei, Xinru Yang, Qing Cui, Xiao Lin, Fei Sun, Peng Jiang, Wenwu Ou, and Yongfeng Zhang. 2019. Value-aware recommendation based on reinforcement profit maximization. In The World Wide Web Conference. 3123–3129.

Digital Library

[36]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.

Digital Library

[37]

Guy Shani, David Heckerman, Ronen I Brafman, and Craig Boutilier. 2005. An MDP-based recommender system.Journal of Machine Learning Research 6, 9 (2005).

[38]

Shaoyun Shi, Hanxiong Chen, Weizhi Ma, Jiaxin Mao, Min Zhang, and Yongfeng Zhang. 2020. Neural logic reasoning. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1365–1374.

Digital Library

[39]

Akash Srivastava, Lazar Valkov, Chris Russell, Michael U Gutmann, and Charles Sutton. 2017. Veegan: Reducing mode collapse in gans using implicit variational learning. Advances in neural information processing systems 30 (2017).

[40]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.

Digital Library

[41]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[42]

Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-based web recommendations: a reinforcement learning approach. In Proceedings of the 2007 ACM conference on Recommender systems. 113–120.

Digital Library

[43]

Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.

Digital Library

[44]

Kai Wang, Zhene Zou, Yue Shang, Qilin Deng, Minghao Zhao, Runze Wu, Xudong Shen, Tangjie Lyu, and Changjie Fan. 2021. RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System. ArXiv abs/2110.11073 (2021).

[45]

Shoujin Wang, Longbing Cao, Yan Wang, Quan Z Sheng, Mehmet A Orgun, and Defu Lian. 2021. A survey on session-based recommender systems. ACM Computing Surveys (CSUR) 54, 7 (2021), 1–38.

Digital Library

[46]

Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 346–353.

Digital Library

[47]

Yikun Xian, Zuohui Fu, Shan Muthukrishnan, Gerard De Melo, and Yongfeng Zhang. 2019. Reinforcement knowledge graph reasoning for explainable recommendation. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 285–294.

Digital Library

[48]

Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, and Joemon M Jose. 2020. Self-supervised reinforcement learning for recommender systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 931–940.

Digital Library

[49]

Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, and Joemon M. Jose. 2022. Supervised Advantage Actor-Critic for Recommender Systems(WSDM ’22). Association for Computing Machinery, New York, NY, USA, 1186–1196. https://doi.org/10.1145/3488560.3498494

Digital Library

[50]

Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269–277.

Digital Library

[51]

Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang, Xiwang Yang, Xiaobing Liu, Hui Liu, and Jiliang Tang. 2021. DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 750–758.

[52]

Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019. " Deep reinforcement learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM sigweb newsletterSpring (2019), 1–15.

[53]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems. 95–103.

Digital Library

[54]

Xiangyu Zhao, Long Xia, Lixin Zou, Hui Liu, Dawei Yin, and Jiliang Tang. 2020. Whole-chain recommendations. In Proceedings of the 29th ACM international conference on information & knowledge management. 1883–1891.

Digital Library

[55]

Xiangyu Zhao, Long Xia, Lixin Zou, Hui Liu, Dawei Yin, and Jiliang Tang. 2021. UserSim: User Simulation via Supervised GenerativeAdversarial Network. In Proceedings of the Web Conference 2021. 3582–3589.

Digital Library

[56]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040–1048.

Digital Library

Cited By

Li PTuzhilin A(2024)A Dynamical System Framework for Exploring Consumer Trajectories in Recommender SystemSSRN Electronic Journal10.2139/ssrn.4847168Online publication date: 2024
https://doi.org/10.2139/ssrn.4847168
Deffayet RThonet THwang DLehoux VRenders Jde Rijke M(2024)SARDINE: Simulator for Automated Recommendation in Dynamic and Interactive EnvironmentsACM Transactions on Recommender Systems10.1145/36564812:3(1-34)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3656481
Liu ZLiu SYang BXue ZCai QZhao XZhang ZHu LLi HJiang PBaeza-Yates RBonchi F(2024)Modeling User Retention through Generative Flow NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671531(5497-5508)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671531
Show More Cited By

Index Terms

Exploration and Regularization of the Latent Action Space in Recommendation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Latent space regularization for recommender systems

The primary latent factor model cannot effectively optimize the user-item latent spaces because of the sparsity and imbalance of the rating data. Although existing studies have focused on exploring auxiliary information for users or items, few ...
Value-aware Recommendation based on Reinforcement Profit Maximization
WWW '19: The World Wide Web Conference

Existing recommendation algorithms mostly focus on optimizing traditional recommendation measures, such as the accuracy of rating prediction in terms of RMSE or the quality of top-k recommendation lists in terms of precision, recall, MAP, etc. However, ...
Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach

Current music recommender systems typically act in a greedy manner by recommending songs with the highest user ratings. Greedy recommendation, however, is suboptimal over the long term: it does not actively gather information on user preferences and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '23: Proceedings of the ACM Web Conference 2023

April 2023

4293 pages

ISBN:9781450394161

DOI:10.1145/3543507

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '23

Sponsor:

SIGWEB

WWW '23: The ACM Web Conference 2023

April 30 - May 4, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
321
Total Downloads

Downloads (Last 12 months)143
Downloads (Last 6 weeks)8

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li PTuzhilin A(2024)A Dynamical System Framework for Exploring Consumer Trajectories in Recommender SystemSSRN Electronic Journal10.2139/ssrn.4847168Online publication date: 2024
https://doi.org/10.2139/ssrn.4847168
Deffayet RThonet THwang DLehoux VRenders Jde Rijke M(2024)SARDINE: Simulator for Automated Recommendation in Dynamic and Interactive EnvironmentsACM Transactions on Recommender Systems10.1145/36564812:3(1-34)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3656481
Liu ZLiu SYang BXue ZCai QZhao XZhang ZHu LLi HJiang PBaeza-Yates RBonchi F(2024)Modeling User Retention through Generative Flow NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671531(5497-5508)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671531
Wang XLiu SWang XCai QHu LLi HJiang PGai KXie GBaeza-Yates RBonchi F(2024)Future Impact Decomposition in Request-level RecommendationsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671506(5905-5916)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671506
Cai QZhao XPan LXin XHuang JZhang WZhao LYin DYang GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)AgentIR: 1st Workshop on Agent-based Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657989(3025-3028)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657989
Yu YGao CChen JTang HSun YChen QMa WZhang MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657868(977-987)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657868
Qin WCao ZYu WSi ZChen SXu JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Explicitly Integrating Judgment Prediction with Legal Document Retrieval: A Law-Guided Generative ApproachProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657717(2210-2220)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657717
Zhang CChen SZhang XDai SYu WXu JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Reinforcing Long-Term Performance in Recommender Systems with User-Oriented Exploration PolicyProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657714(1850-1860)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657714
Zhang ZZhang QWu XShi XLiao GWang YWang XZhao DChua TNgo CKumar RLauw HKa-Wei Lee R(2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648310
Liu SCai QHe ZSun BMcAuley JZheng DJiang PGai KSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Generative Flow Network for Listwise RecommendationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599364(1524-1534)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599364
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents