Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Interactive Recommendation with User-Specific Deep Reinforcement Learning

Published: 15 October 2019 Publication History
  • Get Citation Alerts
  • Abstract

    In this article, we study a multi-step interactive recommendation problem for explicit-feedback recommender systems. Different from the existing works, we propose a novel user-specific deep reinforcement learning approach to the problem. Specifically, we first formulate the problem of interactive recommendation for each target user as a Markov decision process (MDP). We then derive a multi-MDP reinforcement learning task for all involved users. To model the possible relationships (including similarities and differences) between different users’ MDPs, we construct user-specific latent states by using matrix factorization. After that, we propose a user-specific deep Q-learning (UDQN) method to estimate optimal policies based on the constructed user-specific latent states. Furthermore, we propose Biased UDQN (BUDQN) to explicitly model user-specific information by employing an additional bias parameter when estimating the Q-values for different users. Finally, we validate the effectiveness of our approach by comprehensive experimental results and analysis.

    References

    [1]
    G. Adomavicius and A. Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (2005), 734--749.
    [2]
    N. Alon, B. Awerbuch, Y. Azar, and B. Patt-Shamir. 2009. Tell me who I am: An interactive recommendation system. Theory of Computing Systems 45, 2 (2009), 261--279.
    [3]
    P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 2--3 (2002), 235--256.
    [4]
    R. Bellman. 1966. Dynamic programming. Science 153, 3731 (1966), 34--37.
    [5]
    D. A. Berry and B. Fristedt. 1985. Bandit problems: Sequential allocation of experiments. Monographs on Statistics and Applied Probability 5 (1985), 71--87.
    [6]
    S. Chen, Y. Yu, Q. Da, J. Tan, H. Huang, and H. Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In SIGKDD. ACM, 1187--1196.
    [7]
    K. Christakopoulou, F. Radlinski, and K. Hofmann. 2016. Towards conversational recommender systems. In SIGKDD. ACM, 815--824.
    [8]
    P. Cremonesi, Y. Koren, and R. Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In RecSys. ACM, 39--46.
    [9]
    M. A. Domingues, F. Gouyon, A. M. Jorge, J. P. Leal, J. Vinagre, L. Lemos, and M. Sordo. 2013. Combining usage and content in an online recommendation system for music in the long tail. International Journal of Multimedia Information Retrieval 2, 1 (2013), 3--13.
    [10]
    J. Gittins, K. Glazebrook, and R. Weber. 2011. Multi-armed Bandit Allocation Indices. John Wiley 8 Sons.
    [11]
    X. He, H. Zhang, M. Kan, and T. Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In SIGIR. ACM, 549--558.
    [12]
    B. Hu, C. Shi, and J. Liu. 2017. Playlist recommendation based on reinforcement learning. In ICIS. Springer, 172--182.
    [13]
    J. Kawale, H. H. Bui, B. Kveton, L. Tran-Thanh, and S. Chawla. 2015. Efficient thompson sampling for online matrix-factorization recommendation. In NIPS. 1297--1305.
    [14]
    Y. Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In SIGKDD. ACM, 426--434.
    [15]
    Y. Koren and R. Bell. 2015. Advances in collaborative filtering. In Recommender Systems Handbook. Springer, 77--118.
    [16]
    Y. Koren, R. Bell, and C. Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.
    [17]
    L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In WWW. ACM, 661--670.
    [18]
    L. Lin. 1993. Reinforcement Learning for Robots Using Neural Networks. Technical Report. School of Computer Science, Carnegie-Mellon University Pittsburgh, PA.
    [19]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
    [20]
    R. Pálovics, A. A. Benczúr, L. Kocsis, T. Kiss, and E. Frigó. 2014. Exploiting temporal influence in online recommendation. In RecSys. ACM, 273--280.
    [21]
    F. Ricci, L. Rokach, and B. Shapira. 2011. Introduction to recommender systems handbook. In Recommender Systems Handbook. Springer, 1--35.
    [22]
    G. Shani, D. Heckerman, and R. I. Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research 6, Sep (2005), 1265--1295.
    [23]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.
    [24]
    R. S. Sutton and A. G. Barto. 1998. Reinforcement Learning: An Introduction, Vol. 1. MIT Press Cambridge.
    [25]
    N. Taghipour, A. Kardan, and S. S. Ghidary. 2007. Usage-based web recommendations: A reinforcement learning approach. In RecSys. ACM, 113--120.
    [26]
    L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li. 2015. Personalized recommendation via parameter-free contextual bandits. In SIGIR. ACM, 323--332.
    [27]
    H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. 2014. Explore-exploit in top-n recommender systems via gaussian processes. In RecSys. ACM, 225--232.
    [28]
    X. Wang, Y. Wang, D. Hsu, and Y. Wang. 2014. Exploration in interactive personalized music recommendation: A reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 1 (2014), 7.
    [29]
    C. J. Watkins and P. Dayan. 1992. Q-learning. Machine Learning 8, 3--4 (1992), 279--292.
    [30]
    X. Zhao, L. Xia, L. Zhang, Z. Ding, D. Yin, and J. Tang. 2018. Deep reinforcement learning for page-wise recommendations. In RecSys. ACM, 95--103.
    [31]
    X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In SIGKDD. ACM, 1040--1048.
    [32]
    X. Zhao, W. Zhang, and J. Wang. 2013. Interactive collaborative filtering. In CIKM. ACM, 1411--1420.
    [33]
    G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. J. Yuan, X. Xie, and Z. Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WWW. IW3C2, 167--176.

    Cited By

    View all
    • (2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
    • (2024)A Review of Explainable Recommender Systems Utilizing Knowledge Graphs and Reinforcement LearningIEEE Access10.1109/ACCESS.2024.342241612(91999-92019)Online publication date: 2024
    • (2024)RLISR: A Deep Reinforcement Learning Based Interactive Service Recommendation ModelIEEE Access10.1109/ACCESS.2024.342039512(90204-90217)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Reviews

    CK Raju

    Recommender systems are widely used, especially by online applications with a view to enhancing user experience. In most conventional systems, past history of a user's implicit online behavior is used to derive a new recommendation. By enabling an explicit feedback mechanism with the user, would it be possible to design a reinforcement learning model that could lead to better recommendations This paper tests this hypothesis, and the authors suggest a new solution and validate their findings on real-world datasets. Inputs from a user's interactive system are used to model a Markov decision process (MDP), which the paper labels as a T -step interactive recommendation-each step denoting response to a recommendation from the user. The responses are used in a reinforcement learning model, which uses it to learn a global policy by maximizing the cumulative reward it receives. A user-specific deep Q-learning method (christened UDQN) and a bias-incorporated UDQN (christened BUDQN) are formulated, where the existing latent state is used as input and user responses to recommendations are used as output. Two different MovieLens datasets and a Yahoo! music dataset are used as benchmarking datasets to validate the experimental results. Cross-validation aspects are taken care of by using tenfold cross-validation in randomly selecting different samples for training and testing datasets to minimize the effects of overlapping data in test sets. Both of the proposed UDQN and BUDQN methods are seen to achieve better results as a recommender system.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 6
    December 2019
    282 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3366748
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2019
    Accepted: 01 June 2009
    Revised: 01 March 2009
    Received: 01 February 2007
    Published in TKDD Volume 13, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Interactive recommendation
    2. deep Q-learning
    3. deep reinforcement learning

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • The Hong Kong Polytechnic University
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)104
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge GraphACM Transactions on Knowledge Discovery from Data10.1145/364356518:5(1-56)Online publication date: 28-Feb-2024
    • (2024)A Review of Explainable Recommender Systems Utilizing Knowledge Graphs and Reinforcement LearningIEEE Access10.1109/ACCESS.2024.342241612(91999-92019)Online publication date: 2024
    • (2024)RLISR: A Deep Reinforcement Learning Based Interactive Service Recommendation ModelIEEE Access10.1109/ACCESS.2024.342039512(90204-90217)Online publication date: 2024
    • (2024)Towards long-term depolarized interactive recommendationsInformation Processing & Management10.1016/j.ipm.2024.10383361:6(103833)Online publication date: Nov-2024
    • (2024)DRL-HIFA: a dynamic recommendation system with deep reinforcement learning based Hidden Markov Weight Updation and factor analysisMultimedia Tools and Applications10.1007/s11042-024-18296-8Online publication date: 1-Mar-2024
    • (2023)A Systematic Study on Reinforcement Learning Based ApplicationsEnergies10.3390/en1603151216:3(1512)Online publication date: 3-Feb-2023
    • (2023)Integrating Offline Reinforcement Learning with Transformers for Sequential RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3610641(1-1)Online publication date: 14-Sep-2023
    • (2023)A Systematic Study on Reproducibility of Reinforcement Learning in Recommendation SystemsACM Transactions on Recommender Systems10.1145/35965191:3(1-23)Online publication date: 14-Jul-2023
    • (2023)Enabling Goal-Focused Exploration of Podcasts in Interactive Recommender SystemsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584032(142-155)Online publication date: 27-Mar-2023
    • (2023)A Deep Reinforcement Learning Recommender System With Multiple Policies for RecommendationsIEEE Transactions on Industrial Informatics10.1109/TII.2022.320929019:2(2049-2061)Online publication date: Feb-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media