research-article

Open access

Meta Policy Learning for Cold-Start Conversational Recommendation

Authors:

Lingfei WuAuthors Info & Claims

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Pages 222 - 230

https://doi.org/10.1145/3539597.3570443

Published: 27 February 2023 Publication History

Abstract

Conversational recommender systems (CRS) explicitly solicit users' preferences for improved recommendations on the fly. Most existing CRS solutions count on a single policy trained by reinforcement learning for a population of users. However, for users new to the system, such a global policy becomes ineffective to satisfy them, i.e., the cold-start challenge. In this paper, we study CRS policy learning for cold-start users via meta reinforcement learning. We propose to learn a meta policy and adapt it to new users with only a few trials of conversational recommendations. To facilitate fast policy adaptation, we design three synergetic components. Firstly, we design a meta-exploration policy dedicated to identifying user preferences via a few exploratory conversations, which accelerates personalized policy adaptation from the meta policy. Secondly, we adapt the item recommendation module for each user to maximize the recommendation quality based on the collected conversation states during conversations. Thirdly, we propose a Transformer-based state encoder as the backbone to connect the previous two components. It provides comprehensive state representations by modeling complicated relations between positive and negative feedback during the conversation. Extensive experiments on three datasets demonstrate the advantage of our solution in serving new users, compared with a rich set of state-of-the-art CRS solutions.

Supplementary Material

MP4 File (WSDM23-fp0470.mp4)

Presentation video of Meta Policy Learning for Cold-Start Conversational Recommendation

Download
87.10 MB

MP4 File (28_wsdm2023_chu_conversational_recommendation_01.mp4-streaming.mp4)

Meta Policy Learning for Cold-Start Conversational Recommendation

Download
169.82 MB

References

[1]

Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, Vol. 3, Nov (2002), 397--422.

Digital Library

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[3]

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011).

[4]

Renqin Cai, Xueying Bai, Zhenrui Wang, Yuling Shi, Parikshit Sondhi, and Hongning Wang. 2018. Modeling sequential online interactive behaviors with temporal point process. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 873--882.

Digital Library

[5]

Renqin Cai, Qinglei Wang, Chong Wang, and Xiaobing Liu. 2020. Learning to structure long-term dependence for sequential recommendation. arXiv preprint arXiv:2001.11369 (2020).

[6]

Renqin Cai, Jibang Wu, Aidan San, Chong Wang, and Hongning Wang. 2021. Category-aware collaborative sequential recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 388--397.

Digital Library

[7]

Dong Chen, Lingfei Wu, Siliang Tang, Xiao Yun, Bo Long, and Yueting Zhuang. 2022. Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile. Proceedings of the 39th International Conference on Machine Learning (2022).

[8]

Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards conversational recommender systems. In Proceedings of the 22nd ACM SIGKDD. 815--824.

Digital Library

[9]

Zhendong Chu, Jing Ma, and Hongning Wang. 2021. Learning from Crowds by Modeling Common Confusions. In AAAI. 5832--5840.

[10]

Zhendong Chu and Hongning Wang. 2021. Improve learning from crowds via generative augmentation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 167--175.

Digital Library

[11]

Yang Deng, Yaliang Li, Fei Sun, Bolin Ding, and Wai Lam. 2021. Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning. arXiv preprint arXiv:2105.09710 (2021).

[12]

Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. 2016. Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779 (2016).

[13]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML. PMLR, 1126--1135.

[14]

Aurélien Garivier, Tor Lattimore, and Emilie Kaufmann. 2016. On explore-then-commit strategies. Advances in Neural Information Processing Systems, Vol. 29 (2016).

[15]

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline a/b testing for recommender systems. In Proceedings of the 11st ACM WSDM. 198--206.

Digital Library

[16]

F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm TIIS, Vol. 5, 4 (2015), 1--19.

[17]

Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. 507--517.

Digital Library

[18]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173--182.

Digital Library

[19]

John J Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, Vol. 79, 8 (1982), 2554--2558.

[20]

Mengdi Huai, Jianhui Sun, Renqin Cai, Liuyi Yao, and Aidong Zhang. 2020. Malicious attacks against deep reinforcement learning interpretations. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 472--482.

Digital Library

[21]

Jan Humplik, Alexandre Galashov, Leonard Hasenclever, Pedro A Ortega, Yee Whye Teh, and Nicolas Heess. 2019. Meta reinforcement learning as task inference. arXiv preprint arXiv:1905.06424 (2019).

[22]

Pierre-Alexandre Kamienny, Matteo Pirotta, Alessandro Lazaric, Thibault Lavril, Nicolas Usunier, and Ludovic Denoyer. 2020. Learning adaptive exploration strategies in dynamic environments through informed policy regularization. arXiv preprint arXiv:2005.02934 (2020).

[23]

Minseok Kim, Hwanjun Song, Yooju Shin, Dongmin Park, Kijung Shin, and Jae-Gil Lee. 2022. Meta-Learning for Online Update of Recommender Systems. (2022).

[24]

Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.

[25]

Hoyeop Lee, Jinbae Im, Seongwon Jang, Hyunsouk Cho, and Sehee Chung. 2019. Melu: Meta-learned user preference estimator for cold-start recommendation. In Proceedings of the 25th ACM SIGKDD. 1073--1082.

Digital Library

[26]

Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, and Tat-Seng Chua. 2020a. Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In Proceedings of the 13th WSDM Conference. 304--312.

Digital Library

[27]

Wenqiang Lei, Gangyi Zhang, Xiangnan He, Yisong Miao, Xiang Wang, Liang Chen, and Tat-Seng Chua. 2020b. Interactive path reasoning on graph for conversational recommendation. In Proceedings of the 26th ACM SIGKDD. 2073--2083.

Digital Library

[28]

Lihong Li, Jin Young Kim, and Imed Zitouni. 2015. Toward predicting the outcome of an A/B experiment for search relevance. In Proceedings of the Eighth ACM WSDM Conference. 37--46.

Digital Library

[29]

Raymond Li, Samira Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards deep conversational recommendations. In Proceedings of the 32nd NeurIPS Conference. 9748--9758.

[30]

Shijun Li, Wenqiang Lei, Qingyun Wu, Xiangnan He, Peng Jiang, and Tat-Seng Chua. 2021. Seamlessly unifying attributes and items: Conversational recommendation for cold-start users. ACM TOIS, Vol. 39, 4 (2021), 1--29.

Digital Library

[31]

Evan Z Liu, Aditi Raghunathan, Percy Liang, and Chelsea Finn. 2021. Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In ICML. PMLR, 6925--6935.

[32]

Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).

[33]

Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995--1000.

Digital Library

[34]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. 285--295.

Digital Library

[35]

Tobias Schnabel, Paul N Bennett, Susan T Dumais, and Thorsten Joachims. 2018. Short-term satisfaction and long-term coverage: Understanding how users tolerate algorithmic exploration. In Proceedings of the 11st ACM WSDM. 513--521.

Digital Library

[36]

Yueming Sun and Yi Zhang. 2018. Conversational recommender system. In The 41st ACM SIGIR. 235--244.

[37]

Franck Tétard and Mikael Collan. 2009. Lazy user theory: A dynamic model to understand user selection of products and services. In 2009 42nd Hawaii International Conference on System Sciences. IEEE, 1--9.

[38]

Manasi Vartak, Arvind Thiagarajan, Conrado Miranda, Jeshua Bratman, and Hugo Larochelle. 2017. A meta-learning perspective on cold-start recommendations for items. (2017).

[39]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[40]

Risto Vuorio, Shao-Hua Sun, Hexiang Hu, and Joseph J Lim. 2019. Multimodal model-agnostic meta-learning via task-aware modulation. arXiv preprint arXiv:1910.13616 (2019).

[41]

Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. 2016. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763 (2016).

[42]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256.

Digital Library

[43]

Jibang Wu, Renqin Cai, and Hongning Wang. 2020. Déjà vu: A contextualized temporal attention mechanism for sequential recommendation. In Proceedings of The Web Conference 2020. 2199--2209.

Digital Library

[44]

Junda Wu, Canzhe Zhao, Tong Yu, Jingyang Li, and Shuai Li. 2021. Clustering of Conversational Bandits for User Preference Learning and Elicitation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2129--2139.

Digital Library

[45]

Kerui Xu, Jingxuan Yang, Jun Xu, Sheng Gao, Jun Guo, and Ji-Rong Wen. 2021. Adapting User Preference to Online Feedback in Multi-round Conversational Recommendation. In Proceedings of the 14th ACM WSDM. 364--372.

Digital Library

[46]

Fan Yao, Renqin Cai, and Hongning Wang. 2021. Reversible action design for combinatorial optimization with reinforcement learning. arXiv preprint arXiv:2102.07210 (2021).

[47]

Xiaoying Zhang, Hong Xie, Hang Li, and John CS Lui. 2020. Conversational contextual bandit: Algorithm and application. In Proceedings of The Web Conference 2020. 662--672.

Digital Library

[48]

Yiming Zhang, Lingfei Wu, Qi Shen, Yitong Pang, Zhihua Wei, Fangli Xu, Bo Long, and Jian Pei. 2021. Multi-Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation. arXiv preprint arXiv:2112.11775 (2021).

[49]

Canzhe Zhao, Tong Yu, Zhihui Xie, and Shuai Li. 2022. Knowledge-aware Conversational Preference Elicitation with Bandit Feedback. In Proceedings of the ACM Web Conference 2022. 483--492.

Digital Library

[50]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040--1048.

Digital Library

[51]

Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang, and Ji-Rong Wen. 2020. Towards Topic-Guided Conversational Recommender System. In Proceedings of the 28th International Conference on Computational Linguistics. 4128--4139.

[52]

Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, and Dawei Yin. 2020. Neural interactive collaborative filtering. In Proceedings of the 43rd ACM SIGIR. 749--758.

Digital Library

Cited By

Nguyen HNguyen DDoan KNguyen VKiyavash NMooij J(2024)Cold-start recommendation by personalized embedding region elicitationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702808(2766-2786)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702808
Liu QFeng XGu TLiu X(2024)FairCRS: Towards User-oriented Fairness in Conversational Recommendation SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688150(126-136)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688150
Zhang GGao CPan HTeng RLi RSerra ESpezzano F(2024)Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679792(3135-3144)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679792
Show More Cited By

Index Terms

Meta Policy Learning for Cold-Start Conversational Recommendation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Markov decision processes
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conversational recommender systems (CRS) enable the traditional recommender systems to explicitly acquire user preferences towards items and attributes through interactive conversations. Reinforcement learning (RL) is widely adopted to learn ...
Multi-view Hypergraph Contrastive Policy Learning for Conversational Recommendation
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conversational recommendation systems (CRS) aim to interactively acquire user preferences and accordingly recommend items to users. Accurately learning the dynamic user preferences is of crucial importance for CRS. Previous works learn the user ...
Conversational Collaborative Recommendation --- An Experimental Analysis

Traditionally, collaborative recommender systems have been based on a single-shot model of recommendation where a single set of recommendations is generated based on a user's (past) stored preferences. However, content-based recommender system research ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

February 2023

1345 pages

ISBN:9781450394079

DOI:10.1145/3539597

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Hady Lauw
Singapore Management University
,
Program Chairs:
Luo Si
Salesforce
,
Evimaria Terzi
Boston University
,
Panayiotis Tsaparas
University of Ioannina

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

WSDM '23

Sponsor:

WSDM '23: The Sixteenth ACM International Conference on Web Search and Data Mining

February 27 - March 3, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
1,101
Total Downloads

Downloads (Last 12 months)514
Downloads (Last 6 weeks)59

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nguyen HNguyen DDoan KNguyen VKiyavash NMooij J(2024)Cold-start recommendation by personalized embedding region elicitationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702808(2766-2786)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702808
Liu QFeng XGu TLiu X(2024)FairCRS: Towards User-oriented Fairness in Conversational Recommendation SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688150(126-136)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688150
Zhang GGao CPan HTeng RLi RSerra ESpezzano F(2024)Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679792(3135-3144)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679792
Dao HDeng YLe DLiao LHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Broadening the View: Demonstration-augmented Prompt Learning for Conversational RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657755(785-795)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657755
Dai XWang ZXie JLiu XLui J(2024)Conversational Recommendation With Online Learning and Clustering on Misspecified UsersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342344236:12(7825-7838)Online publication date: Dec-2024
https://doi.org/10.1109/TKDE.2024.3423442
Liu HJing LYu JNg M(2024)Learning Hierarchical Preferences for Recommendation With Mixture Intention Neural Stochastic ProcessesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334849336:7(3237-3251)Online publication date: Jul-2024
https://doi.org/10.1109/TKDE.2023.3348493
Yu DLi QWang XLi QXu G(2024)Counterfactual Explainable Conversational RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332240336:6(2388-2400)Online publication date: Jun-2024
https://doi.org/10.1109/TKDE.2023.3322403
Qin BHuang ZWu ZWang CChen Y(2024)MetaGA: Metalearning With Graph-Attention for Improved Long-Tail Item RecommendationIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.341104311:5(6544-6556)Online publication date: Oct-2024
https://doi.org/10.1109/TCSS.2024.3411043
Du HLi HPeng QFu L(2024)User Linguistic Style Awareness and Interest-Driven Conversational Recommender Systems2024 IEEE 4th International Conference on Digital Twins and Parallel Intelligence (DTPI)10.1109/DTPI61353.2024.10778861(215-220)Online publication date: 18-Oct-2024
https://doi.org/10.1109/DTPI61353.2024.10778861
Li PTong XWang YZhang Q(2024)Meta doubly robust: Debiasing CVR prediction via meta-learning with a small amount of unbiased dataKnowledge-Based Systems10.1016/j.knosys.2024.112898(112898)Online publication date: Dec-2024
https://doi.org/10.1016/j.knosys.2024.112898
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten