research-article

Interactive Recommendation with User-Specific Deep Reinforcement Learning

Authors:

Yu Lei,

Wenjie LiAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 13, Issue 6

Article No.: 61, Pages 1 - 15

https://doi.org/10.1145/3359554

Published: 15 October 2019 Publication History

Get Access

Abstract

In this article, we study a multi-step interactive recommendation problem for explicit-feedback recommender systems. Different from the existing works, we propose a novel user-specific deep reinforcement learning approach to the problem. Specifically, we first formulate the problem of interactive recommendation for each target user as a Markov decision process (MDP). We then derive a multi-MDP reinforcement learning task for all involved users. To model the possible relationships (including similarities and differences) between different users’ MDPs, we construct user-specific latent states by using matrix factorization. After that, we propose a user-specific deep Q-learning (UDQN) method to estimate optimal policies based on the constructed user-specific latent states. Furthermore, we propose Biased UDQN (BUDQN) to explicitly model user-specific information by employing an additional bias parameter when estimating the Q-values for different users. Finally, we validate the effectiveness of our approach by comprehensive experimental results and analysis.

References

[1]

G. Adomavicius and A. Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (2005), 734--749.

Digital Library

Google Scholar

[2]

N. Alon, B. Awerbuch, Y. Azar, and B. Patt-Shamir. 2009. Tell me who I am: An interactive recommendation system. Theory of Computing Systems 45, 2 (2009), 261--279.

Digital Library

Google Scholar

[3]

P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 2--3 (2002), 235--256.

Digital Library

Google Scholar

[4]

R. Bellman. 1966. Dynamic programming. Science 153, 3731 (1966), 34--37.

Google Scholar

[5]

D. A. Berry and B. Fristedt. 1985. Bandit problems: Sequential allocation of experiments. Monographs on Statistics and Applied Probability 5 (1985), 71--87.

Google Scholar

[6]

S. Chen, Y. Yu, Q. Da, J. Tan, H. Huang, and H. Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In SIGKDD. ACM, 1187--1196.

Google Scholar

[7]

K. Christakopoulou, F. Radlinski, and K. Hofmann. 2016. Towards conversational recommender systems. In SIGKDD. ACM, 815--824.

Google Scholar

[8]

P. Cremonesi, Y. Koren, and R. Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In RecSys. ACM, 39--46.

Google Scholar

[9]

M. A. Domingues, F. Gouyon, A. M. Jorge, J. P. Leal, J. Vinagre, L. Lemos, and M. Sordo. 2013. Combining usage and content in an online recommendation system for music in the long tail. International Journal of Multimedia Information Retrieval 2, 1 (2013), 3--13.

Crossref

Google Scholar

[10]

J. Gittins, K. Glazebrook, and R. Weber. 2011. Multi-armed Bandit Allocation Indices. John Wiley 8 Sons.

Google Scholar

[11]

X. He, H. Zhang, M. Kan, and T. Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In SIGIR. ACM, 549--558.

Google Scholar

[12]

B. Hu, C. Shi, and J. Liu. 2017. Playlist recommendation based on reinforcement learning. In ICIS. Springer, 172--182.

Google Scholar

[13]

J. Kawale, H. H. Bui, B. Kveton, L. Tran-Thanh, and S. Chawla. 2015. Efficient thompson sampling for online matrix-factorization recommendation. In NIPS. 1297--1305.

Google Scholar

[14]

Y. Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In SIGKDD. ACM, 426--434.

Google Scholar

[15]

Y. Koren and R. Bell. 2015. Advances in collaborative filtering. In Recommender Systems Handbook. Springer, 77--118.

Google Scholar

[16]

Y. Koren, R. Bell, and C. Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.

Digital Library

Google Scholar

[17]

L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In WWW. ACM, 661--670.

Google Scholar

[18]

L. Lin. 1993. Reinforcement Learning for Robots Using Neural Networks. Technical Report. School of Computer Science, Carnegie-Mellon University Pittsburgh, PA.

Google Scholar

[19]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.

Google Scholar

[20]

R. Pálovics, A. A. Benczúr, L. Kocsis, T. Kiss, and E. Frigó. 2014. Exploiting temporal influence in online recommendation. In RecSys. ACM, 273--280.

Google Scholar

[21]

F. Ricci, L. Rokach, and B. Shapira. 2011. Introduction to recommender systems handbook. In Recommender Systems Handbook. Springer, 1--35.

Google Scholar

[22]

G. Shani, D. Heckerman, and R. I. Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research 6, Sep (2005), 1265--1295.

Digital Library

Google Scholar

[23]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.

Google Scholar

[24]

R. S. Sutton and A. G. Barto. 1998. Reinforcement Learning: An Introduction, Vol. 1. MIT Press Cambridge.

Digital Library

Google Scholar

[25]

N. Taghipour, A. Kardan, and S. S. Ghidary. 2007. Usage-based web recommendations: A reinforcement learning approach. In RecSys. ACM, 113--120.

Google Scholar

[26]

L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li. 2015. Personalized recommendation via parameter-free contextual bandits. In SIGIR. ACM, 323--332.

Google Scholar

[27]

H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. 2014. Explore-exploit in top-n recommender systems via gaussian processes. In RecSys. ACM, 225--232.

Google Scholar

[28]

X. Wang, Y. Wang, D. Hsu, and Y. Wang. 2014. Exploration in interactive personalized music recommendation: A reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 1 (2014), 7.

Digital Library

Google Scholar

[29]

C. J. Watkins and P. Dayan. 1992. Q-learning. Machine Learning 8, 3--4 (1992), 279--292.

Digital Library

Google Scholar

[30]

X. Zhao, L. Xia, L. Zhang, Z. Ding, D. Yin, and J. Tang. 2018. Deep reinforcement learning for page-wise recommendations. In RecSys. ACM, 95--103.

Google Scholar

[31]

X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In SIGKDD. ACM, 1040--1048.

Google Scholar

[32]

X. Zhao, W. Zhang, and J. Wang. 2013. Interactive collaborative filtering. In CIKM. ACM, 1411--1420.

Google Scholar

[33]

G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. J. Yuan, X. Xie, and Z. Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WWW. IW3C2, 167--176.

Digital Library

Google Scholar

Cited By

View all

Rong HQian MMa TJin DSheng V(2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3643565
Tiwary NMohd Noah SFauzi FYee T(2024)A Review of Explainable Recommender Systems Utilizing Knowledge Graphs and Reinforcement LearningIEEE Access10.1109/ACCESS.2024.342241612(91999-92019)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3422416
Zhang MQu YLi YWen XZhou Y(2024)RLISR: A Deep Reinforcement Learning Based Interactive Service Recommendation ModelIEEE Access10.1109/ACCESS.2024.342039512(90204-90217)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3420395
Show More Cited By

Index Terms

Interactive Recommendation with User-Specific Deep Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
    2. Machine learning approaches
      1. Markov decision processes
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Deep reinforcement learning for page-wise recommendations
RecSys '18: Proceedings of the 12th ACM Conference on Recommender Systems

Recommender systems can mitigate the information overload problem by suggesting users' personalized items. In real-world recommendations such as e-commerce, a typical interaction between the system and its users is - users are recommended a page of ...
Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation
ICCSE '21: 5th International Conference on Crowd Science and Engineering

The interactive recommendation systems have recently attracted lots of research attentions, because they can model the interactions between the user and the recommender system. Most previous interactive recommendation methods only focus on optimizing ...
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...

Reviews

Reviewer: CK Raju

Recommender systems are widely used, especially by online applications with a view to enhancing user experience. In most conventional systems, past history of a user's implicit online behavior is used to derive a new recommendation. By enabling an explicit feedback mechanism with the user, would it be possible to design a reinforcement learning model that could lead to better recommendations This paper tests this hypothesis, and the authors suggest a new solution and validate their findings on real-world datasets. Inputs from a user's interactive system are used to model a Markov decision process (MDP), which the paper labels as a T -step interactive recommendation-each step denoting response to a recommendation from the user. The responses are used in a reinforcement learning model, which uses it to learn a global policy by maximizing the cumulative reward it receives. A user-specific deep Q-learning method (christened UDQN) and a bias-incorporated UDQN (christened BUDQN) are formulated, where the existing latent state is used as input and user responses to recommendations are used as output. Two different MovieLens datasets and a Yahoo! music dataset are used as benchmarking datasets to validate the experimental results. Cross-validation aspects are taken care of by using tenfold cross-validation in randomly selecting different samples for training and testing datasets to minimize the effects of overlapping data in test sets. Both of the proposed UDQN and BUDQN methods are seen to achieve better results as a recommender system.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 13, Issue 6

December 2019

282 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3366748

Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
Minginglamp Academy of Sciences, China

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Accepted: 01 June 2009

Revised: 01 March 2009

Received: 01 February 2007

Published in TKDD Volume 13, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

The Hong Kong Polytechnic University
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
1,015
Total Downloads

Downloads (Last 12 months)104
Downloads (Last 6 weeks)10

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Rong HQian MMa TJin DSheng V(2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
https://dl.acm.org/doi/10.1145/3643565
Tiwary NMohd Noah SFauzi FYee T(2024)A Review of Explainable Recommender Systems Utilizing Knowledge Graphs and Reinforcement LearningIEEE Access10.1109/ACCESS.2024.342241612(91999-92019)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3422416
Zhang MQu YLi YWen XZhou Y(2024)RLISR: A Deep Reinforcement Learning Based Interactive Service Recommendation ModelIEEE Access10.1109/ACCESS.2024.342039512(90204-90217)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3420395
Lechiakh MEl-Moutaouakkil ZMaurer A(2024)Towards long-term depolarized interactive recommendationsInformation Processing & Management10.1016/j.ipm.2024.10383361:6(103833)Online publication date: Nov-2024
https://doi.org/10.1016/j.ipm.2024.103833
S KShyam G(2024)DRL-HIFA: a dynamic recommendation system with deep reinforcement learning based Hidden Markov Weight Updation and factor analysisMultimedia Tools and Applications10.1007/s11042-024-18296-8Online publication date: 1-Mar-2024
https://doi.org/10.1007/s11042-024-18296-8
Sivamayil KRajasekar EAljafari BNikolovski SVairavasundaram SVairavasundaram I(2023)A Systematic Study on Reinforcement Learning Based ApplicationsEnergies10.3390/en1603151216:3(1512)Online publication date: 3-Feb-2023
https://doi.org/10.3390/en16031512
Xi XZhao YLiu QOuyang LWu Y(2023)Integrating Offline Reinforcement Learning with Transformers for Sequential RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3610641(1-1)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3610641
Cavenaghi ESottocornola GStella FZanker M(2023)A Systematic Study on Reproducibility of Reinforcement Learning in Recommendation SystemsACM Transactions on Recommender Systems10.1145/35965191:3(1-23)Online publication date: 14-Jul-2023
https://dl.acm.org/doi/10.1145/3596519
Liang YPonnada ALamere PDaskalova N(2023)Enabling Goal-Focused Exploration of Podcasts in Interactive Recommender SystemsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584032(142-155)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584032
Fu MHuang LRao AIrissappane AZhang JQu H(2023)A Deep Reinforcement Learning Recommender System With Multiple Policies for RecommendationsIEEE Transactions on Industrial Informatics10.1109/TII.2022.320929019:2(2049-2061)Online publication date: Feb-2023
https://doi.org/10.1109/TII.2022.3209290
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

Deep reinforcement learning for page-wise recommendations

Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations