Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539597.3570443acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Meta Policy Learning for Cold-Start Conversational Recommendation

Published: 27 February 2023 Publication History

Abstract

Conversational recommender systems (CRS) explicitly solicit users' preferences for improved recommendations on the fly. Most existing CRS solutions count on a single policy trained by reinforcement learning for a population of users. However, for users new to the system, such a global policy becomes ineffective to satisfy them, i.e., the cold-start challenge. In this paper, we study CRS policy learning for cold-start users via meta reinforcement learning. We propose to learn a meta policy and adapt it to new users with only a few trials of conversational recommendations. To facilitate fast policy adaptation, we design three synergetic components. Firstly, we design a meta-exploration policy dedicated to identifying user preferences via a few exploratory conversations, which accelerates personalized policy adaptation from the meta policy. Secondly, we adapt the item recommendation module for each user to maximize the recommendation quality based on the collected conversation states during conversations. Thirdly, we propose a Transformer-based state encoder as the backbone to connect the previous two components. It provides comprehensive state representations by modeling complicated relations between positive and negative feedback during the conversation. Extensive experiments on three datasets demonstrate the advantage of our solution in serving new users, compared with a rich set of state-of-the-art CRS solutions.

Supplementary Material

MP4 File (WSDM23-fp0470.mp4)
Presentation video of Meta Policy Learning for Cold-Start Conversational Recommendation
MP4 File (28_wsdm2023_chu_conversational_recommendation_01.mp4-streaming.mp4)
Meta Policy Learning for Cold-Start Conversational Recommendation

References

[1]
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, Vol. 3, Nov (2002), 397--422.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[3]
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011).
[4]
Renqin Cai, Xueying Bai, Zhenrui Wang, Yuling Shi, Parikshit Sondhi, and Hongning Wang. 2018. Modeling sequential online interactive behaviors with temporal point process. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 873--882.
[5]
Renqin Cai, Qinglei Wang, Chong Wang, and Xiaobing Liu. 2020. Learning to structure long-term dependence for sequential recommendation. arXiv preprint arXiv:2001.11369 (2020).
[6]
Renqin Cai, Jibang Wu, Aidan San, Chong Wang, and Hongning Wang. 2021. Category-aware collaborative sequential recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 388--397.
[7]
Dong Chen, Lingfei Wu, Siliang Tang, Xiao Yun, Bo Long, and Yueting Zhuang. 2022. Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile. Proceedings of the 39th International Conference on Machine Learning (2022).
[8]
Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards conversational recommender systems. In Proceedings of the 22nd ACM SIGKDD. 815--824.
[9]
Zhendong Chu, Jing Ma, and Hongning Wang. 2021. Learning from Crowds by Modeling Common Confusions. In AAAI. 5832--5840.
[10]
Zhendong Chu and Hongning Wang. 2021. Improve learning from crowds via generative augmentation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 167--175.
[11]
Yang Deng, Yaliang Li, Fei Sun, Bolin Ding, and Wai Lam. 2021. Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning. arXiv preprint arXiv:2105.09710 (2021).
[12]
Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. 2016. Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779 (2016).
[13]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML. PMLR, 1126--1135.
[14]
Aurélien Garivier, Tor Lattimore, and Emilie Kaufmann. 2016. On explore-then-commit strategies. Advances in Neural Information Processing Systems, Vol. 29 (2016).
[15]
Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline a/b testing for recommender systems. In Proceedings of the 11st ACM WSDM. 198--206.
[16]
F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm TIIS, Vol. 5, 4 (2015), 1--19.
[17]
Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. 507--517.
[18]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173--182.
[19]
John J Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, Vol. 79, 8 (1982), 2554--2558.
[20]
Mengdi Huai, Jianhui Sun, Renqin Cai, Liuyi Yao, and Aidong Zhang. 2020. Malicious attacks against deep reinforcement learning interpretations. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 472--482.
[21]
Jan Humplik, Alexandre Galashov, Leonard Hasenclever, Pedro A Ortega, Yee Whye Teh, and Nicolas Heess. 2019. Meta reinforcement learning as task inference. arXiv preprint arXiv:1905.06424 (2019).
[22]
Pierre-Alexandre Kamienny, Matteo Pirotta, Alessandro Lazaric, Thibault Lavril, Nicolas Usunier, and Ludovic Denoyer. 2020. Learning adaptive exploration strategies in dynamic environments through informed policy regularization. arXiv preprint arXiv:2005.02934 (2020).
[23]
Minseok Kim, Hwanjun Song, Yooju Shin, Dongmin Park, Kijung Shin, and Jae-Gil Lee. 2022. Meta-Learning for Online Update of Recommender Systems. (2022).
[24]
Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
[25]
Hoyeop Lee, Jinbae Im, Seongwon Jang, Hyunsouk Cho, and Sehee Chung. 2019. Melu: Meta-learned user preference estimator for cold-start recommendation. In Proceedings of the 25th ACM SIGKDD. 1073--1082.
[26]
Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, and Tat-Seng Chua. 2020a. Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In Proceedings of the 13th WSDM Conference. 304--312.
[27]
Wenqiang Lei, Gangyi Zhang, Xiangnan He, Yisong Miao, Xiang Wang, Liang Chen, and Tat-Seng Chua. 2020b. Interactive path reasoning on graph for conversational recommendation. In Proceedings of the 26th ACM SIGKDD. 2073--2083.
[28]
Lihong Li, Jin Young Kim, and Imed Zitouni. 2015. Toward predicting the outcome of an A/B experiment for search relevance. In Proceedings of the Eighth ACM WSDM Conference. 37--46.
[29]
Raymond Li, Samira Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards deep conversational recommendations. In Proceedings of the 32nd NeurIPS Conference. 9748--9758.
[30]
Shijun Li, Wenqiang Lei, Qingyun Wu, Xiangnan He, Peng Jiang, and Tat-Seng Chua. 2021. Seamlessly unifying attributes and items: Conversational recommendation for cold-start users. ACM TOIS, Vol. 39, 4 (2021), 1--29.
[31]
Evan Z Liu, Aditi Raghunathan, Percy Liang, and Chelsea Finn. 2021. Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In ICML. PMLR, 6925--6935.
[32]
Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).
[33]
Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995--1000.
[34]
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. 285--295.
[35]
Tobias Schnabel, Paul N Bennett, Susan T Dumais, and Thorsten Joachims. 2018. Short-term satisfaction and long-term coverage: Understanding how users tolerate algorithmic exploration. In Proceedings of the 11st ACM WSDM. 513--521.
[36]
Yueming Sun and Yi Zhang. 2018. Conversational recommender system. In The 41st ACM SIGIR. 235--244.
[37]
Franck Tétard and Mikael Collan. 2009. Lazy user theory: A dynamic model to understand user selection of products and services. In 2009 42nd Hawaii International Conference on System Sciences. IEEE, 1--9.
[38]
Manasi Vartak, Arvind Thiagarajan, Conrado Miranda, Jeshua Bratman, and Hugo Larochelle. 2017. A meta-learning perspective on cold-start recommendations for items. (2017).
[39]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[40]
Risto Vuorio, Shao-Hua Sun, Hexiang Hu, and Joseph J Lim. 2019. Multimodal model-agnostic meta-learning via task-aware modulation. arXiv preprint arXiv:1910.13616 (2019).
[41]
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. 2016. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763 (2016).
[42]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256.
[43]
Jibang Wu, Renqin Cai, and Hongning Wang. 2020. Déjà vu: A contextualized temporal attention mechanism for sequential recommendation. In Proceedings of The Web Conference 2020. 2199--2209.
[44]
Junda Wu, Canzhe Zhao, Tong Yu, Jingyang Li, and Shuai Li. 2021. Clustering of Conversational Bandits for User Preference Learning and Elicitation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2129--2139.
[45]
Kerui Xu, Jingxuan Yang, Jun Xu, Sheng Gao, Jun Guo, and Ji-Rong Wen. 2021. Adapting User Preference to Online Feedback in Multi-round Conversational Recommendation. In Proceedings of the 14th ACM WSDM. 364--372.
[46]
Fan Yao, Renqin Cai, and Hongning Wang. 2021. Reversible action design for combinatorial optimization with reinforcement learning. arXiv preprint arXiv:2102.07210 (2021).
[47]
Xiaoying Zhang, Hong Xie, Hang Li, and John CS Lui. 2020. Conversational contextual bandit: Algorithm and application. In Proceedings of The Web Conference 2020. 662--672.
[48]
Yiming Zhang, Lingfei Wu, Qi Shen, Yitong Pang, Zhihua Wei, Fangli Xu, Bo Long, and Jian Pei. 2021. Multi-Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation. arXiv preprint arXiv:2112.11775 (2021).
[49]
Canzhe Zhao, Tong Yu, Zhihui Xie, and Shuai Li. 2022. Knowledge-aware Conversational Preference Elicitation with Bandit Feedback. In Proceedings of the ACM Web Conference 2022. 483--492.
[50]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040--1048.
[51]
Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang, and Ji-Rong Wen. 2020. Towards Topic-Guided Conversational Recommender System. In Proceedings of the 28th International Conference on Computational Linguistics. 4128--4139.
[52]
Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, and Dawei Yin. 2020. Neural interactive collaborative filtering. In Proceedings of the 43rd ACM SIGIR. 749--758.

Cited By

View all
  • (2024)Cold-start recommendation by personalized embedding region elicitationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702808(2766-2786)Online publication date: 15-Jul-2024
  • (2024)FairCRS: Towards User-oriented Fairness in Conversational Recommendation SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688150(126-136)Online publication date: 8-Oct-2024
  • (2024)Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679792(3135-3144)Online publication date: 21-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining
February 2023
1345 pages
ISBN:9781450394079
DOI:10.1145/3539597
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Check for updates

Author Tags

  1. conversational recommendation
  2. meta learning
  3. reinforcement learning

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM '23

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)514
  • Downloads (Last 6 weeks)59
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cold-start recommendation by personalized embedding region elicitationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702808(2766-2786)Online publication date: 15-Jul-2024
  • (2024)FairCRS: Towards User-oriented Fairness in Conversational Recommendation SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688150(126-136)Online publication date: 8-Oct-2024
  • (2024)Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679792(3135-3144)Online publication date: 21-Oct-2024
  • (2024)Broadening the View: Demonstration-augmented Prompt Learning for Conversational RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657755(785-795)Online publication date: 10-Jul-2024
  • (2024)Conversational Recommendation With Online Learning and Clustering on Misspecified UsersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342344236:12(7825-7838)Online publication date: Dec-2024
  • (2024)Learning Hierarchical Preferences for Recommendation With Mixture Intention Neural Stochastic ProcessesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334849336:7(3237-3251)Online publication date: Jul-2024
  • (2024)Counterfactual Explainable Conversational RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332240336:6(2388-2400)Online publication date: Jun-2024
  • (2024)MetaGA: Metalearning With Graph-Attention for Improved Long-Tail Item RecommendationIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.341104311:5(6544-6556)Online publication date: Oct-2024
  • (2024)User Linguistic Style Awareness and Interest-Driven Conversational Recommender Systems2024 IEEE 4th International Conference on Digital Twins and Parallel Intelligence (DTPI)10.1109/DTPI61353.2024.10778861(215-220)Online publication date: 18-Oct-2024
  • (2024)Meta doubly robust: Debiasing CVR prediction via meta-learning with a small amount of unbiased dataKnowledge-Based Systems10.1016/j.knosys.2024.112898(112898)Online publication date: Dec-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media