Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Proximal policy optimization based hybrid recommender systems for large scale recommendations

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recommender systems have become increasingly popular due to the significant rise in digital information over the internet in recent users. They help provide personalized recommendations to the user by selecting a few items out of a large set of items. However, with the growing size of item space and users, scalability remains a key issue for recommender systems. However, most existing policy gradient approaches in recommendations suffer from high variance leading to an increase in instability during the learning process. Policy Gradient Algorithms such as PPO are proven to be effective in large action spaces (a large number of items) as they learn the optimal policy directly from the samples. We use the PPO algorithm to train our Reinforcement Learning agent modeling the collaborative filtering process as a Markov Decision Process. PPO utilizes the actor-critic framework and thus mitigates the high variance in Policy Gradient Algorithms. Further, we address the cold start issue in Collaborative filtering with autoencoder-based content filtering. Proximal Policy Optimization (PPO) methods are today considered among the most effective reinforcement learning methods, achieving state-of-the-art performance and even outperforming Deep Q learning methods. In this paper, we propose a switching hybrid recommender system using the two different recommender system techniques. A switching hybrid system can switch between recommendation techniques depending on some criterion and can tackle its constituent recommender system’s shortfall using the other counterpart in a particular situation. We show that our method outperforms various baseline methods on the popular Movielens datasets for different evaluation metrics. On Movielens 1m, our method outperforms the baseline by 9.19% in terms of R@10 and 3.86% and 6.58% in terms of P@10 and P@20, respectively. For the Movielens 100k dataset, our method improves on the baseline methods by 4.10% in terms of P@10 and 3.90% and 2.40% in terms of R@10 and R@20.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

All the data required for this research work, i.e., Movielens 1M and Movielens 100k dataset, is available publicly on the Movielens Website. The data can be accessed from https://grouplens.org/datasets/movielens/.

References

  1. Akerkar B, Sajja P (2010) Knowledge-based systems. MIT press Cambridge, 978-0763776473

  2. Aljunid MF, Manjaiah DH (2020) An efficient deep learning approach for collaborative filtering recommender system procedia computer science. Third International Conference on Computing and Network Communications (CoCoNet’19) 171:829–836

    Google Scholar 

  3. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) A brief survey of deep reinforcement learning. IEEE Signal Proc Mag 2:26–38

    Article  Google Scholar 

  4. Bhatti UA, Huang M, Wang H, Zhang Y, Mehmood A, Wu D (2018) Recommendation system for immunization coverage and monitoring. Human Vaccines & Immunotherapeutics, 165–171

  5. Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Information Systems, 329–351

  6. Breese JS, Heckerman D, Kadie C (2013) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52, UAI’98, Madison, Wisconsin

  7. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai Gym. arXiv:1606.01540

  8. Burke R (2002) Hybrid recommender systems: survey and experiments. User Modeling and User-adapted Interaction 12:331–370

    Article  MATH  Google Scholar 

  9. Chen H (2021) A DQN-based Recommender System for Item-list Recommendation. In: IEEE International Conference on Big Data (Big Data), pp 5699–5702

  10. Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi Ed (2019) Top-K Off-Policy Correction for a REINFORCE Recommender System, Association for Computing Machinery, New York, NY, USA, 456–464

  11. Chen X, Yao L, McAuley J, Zhou G, Wang X (2021) A survey of deep reinforcement learning in recommender systems: a systematic review and future directions. arXiv:2109.03540

  12. Dulac-Arnold G, Evans R, Hasselt HV, Sunehag P, Lillicrap T, Hunt J, Mann T, Weber T, Degris T, Coppin B (2015) Reinforcement learning in large discrete action spaces. Corr, abs/1512.07679

  13. He X, Li L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, pp 173–182

  14. Hu Y, Da Q, Zeng A, Yu Y, Xu Y (2018) Reinforcement learning to rank in e-commerce search engine Formalization, analysis, and application. Corr, abs/1803.00710

  15. Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Association for computing machinery, New York, USA, pp. 426–434

  16. Li L, Chu W, Langford J, Chapire RE (2010) A contextual-bandit approach to personalized news article recommendation. J Mach Learn Res, 661–670

  17. Li W, Zhou X, Shimizu S, Xin M, Jiang J, Gao H, Jin Q (2019) Personalization recommendation algorithm based on trust correlation degree and matrix factorization. IEEE Access 7:45451–45459

    Article  Google Scholar 

  18. Lin W, Zhang X, Qi L, Li W, Li S, Sheng VS, Nepal S (2021) Location-Aware Service recommendations with Privacy-Preservation in the internet of things. IEEE Trans Comput Social Syst 8:227–235

    Article  Google Scholar 

  19. Liu F, Tang R, Li X, Zhang W, Ye Y, Chen H, Guo H, Zhang Y (2018) Deep reinforcement learning based recommenda- tion with explicit user-item interactions modeling, arXiv:1810.12027

  20. Liu Y, Wang S, Khan MS, He J (2018) A novel deep hybrid recommender system based on auto-encoder with neural collaborative filtering. Big Data Mining and Analytics, 211–221

  21. Marlin B (2003) Modeling User Rating Profiles for Collaborative Filtering. MIT Press. Cambridge, MA, USA

  22. Mnih A, Salakhutdinov R (2008) Probabilistic matrix factorization. NIPS, 1257–1264

  23. Mnih A, Salakhutdinov R (2008) Bayesian probabilistic matrix factorization using markov chain monte carlo. ICML, pp. 880–887

  24. Pan F, Cai Q, Tang P, Zhuang F, He Q (2019) Policy Gradients for Contextual Recommendations. In: Association for Computing Machinery, New York, NY, USA, 1421–1431

  25. Polat H, Du W (2005) SVD-based collaborative filtering with privacy. Association for Computing Machinery, 791–795, New York, NY, USA

  26. Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N (2021) Stable-Baselines3 reliable reinforcement learning implementations. J Mach Learn Res 22:1–8

    MATH  Google Scholar 

  27. Salakhutdinov R, Mnih A, Hinton G (2007) Restricted Boltzmann machines for collaborative filtering. NIPS, 791–798

  28. Sarwar B, Karypis G, Konstan J, Riedl J (2017) Item-based collaborative filtering recommendation algorithms. arXiv:1707.06347

  29. Schulman J, Wolski P, Dhariwal P, Radford A, Klimo O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  30. Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: Autoencoders meet collaborative filtering. In: Proceedings of the 24th International Conference on World Wide Web, WWW ’15, Companion, New York, NY. USA, 111–112

  31. Shani G, Heckerman D, Brafman I (2005) An mdp-based recommender system. J Mach Learn Res 15324435:1265–1295

    MathSciNet  MATH  Google Scholar 

  32. Singh M (2020) Scalability and sparsity issues in recommender datasets: a survey. Knowl Inf Syst, 1–43

  33. Srivihok A, Sukonmanee P (2005) E-Commerce intelligent agent personalization travel support agent using q learning. In: Association for Computing Machinery, New York, NY USA

  34. Sutton R (1998) Reinforcement learning: an introduction. vol. 1, no. 1., MIT press Cambridge

  35. Sutton R, Singh S, McAllester D (2000) Comparing policy-gradient algorithms. IEEE Transactions on Systems Man, and Cybernetics

  36. Taghipour N, Kardan A (2008) A hybrid web recommender system based on q-learning. In: Proceedings of the ACM Symposium on Applied Computing (SAC), Fortaleza, Ceara, Brazil, pp. 1164–1168, March, 16–20, 2008

  37. Tao Y, Wang C, Yao L, Li W, Yu Y (2021) Item trend learning for sequential recommendation system using gated graph neural network. Neural Comput & Applic, 1–16

  38. Van Hasselt H, Doron Y, Strub F, Hessel M, Sonnerat N, Modayil J (2018) Deep reinforcement learning and the deadly triad, arXiv:1812.02648

  39. Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. NIPS, 2643–2651

  40. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 1096–1103

  41. Vozalis M, Margaritis K (2004) Collaborative filtering enhanced by Demographic Correleation

  42. Vozalis M, Margaritis K (2006) On the enhancement of collaborative filtering by demographic data. Web Intelligence and Agent Systems 2:117–138

    Google Scholar 

  43. Wang H, Wang N, Yeung DY (2015) Collaborative deep learning for recommender systems. KDD, 1235–1244

  44. Wei K, Huang J, Fu S (2007) A survey of e-commerce recommender systems. In: 2007 International conference on service systems and service management. pp 1–5

  45. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bulletin, 80–83

  46. Wu Y, DuBois C, Zheng AX, Ester M (2016) Collaborative denoising auto-encoders for top-N recommender systems. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp 153–162

  47. Wu C, Rajeswaran A, Duan Y, Kumar V, Bayen AM, Kakade S, Mordatch I, Abbeel P (2018) Variance reduction for policy gradient with action-dependent factorized baselines, arXiv:1803.07246

  48. Xue H, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems, IJCAI 3203–3209, 17, Melbourne, Australia

  49. Zhang S, Yao L, Sun A, Tay Y (2019) Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Comput. Surv., 52

  50. Zhao X, Xia L, Zhang L, Ding Z, Yin D, Tang J (2018) Deep reinforcement learning for page-wise recommendations, abs/1801.00209

  51. Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018) Recommendations with negative feedback via pairwise deep reinforcement learning. Corr2, abs/1802.06501

  52. Zheng G, Zhang F, Zheng Z, Xiang Y, Nicholas J, Xie X, Li ZDRN (2018) A deep reinforcement learning framework for news recommendation. International World Wide Web Conferences Steering Committee 2:167–176

    Google Scholar 

  53. Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li ZDRN (2018) DRN: a deep reinforcement learning framework for news recommendation. WWW 2018, Lyon, France, April, 23-27, 167–176

  54. Zou L, Xia L, Du P, Zhang Z, Bai T, Liu W, Nie JY, Yin D (2020) Pseudo Dyna-Q: a reinforcement learning framework for interactive recommendation. In: Association for computing machinery, New York, NY, USA, pp. 816–824

  55. Zou L, Xia L, Gu Y, Zhao X, Liu W, Huang J, Yin D (2020) Neural interactive collaborative filtering. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 749–758

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaibhav Padhye.

Ethics declarations

Conflict of Interests

All authors declare that they have no conflicts of interest and this research received no funding.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Padhye, V., Lakshmanan, K. & Chaturvedi, A. Proximal policy optimization based hybrid recommender systems for large scale recommendations. Multimed Tools Appl 82, 20079–20100 (2023). https://doi.org/10.1007/s11042-022-14231-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14231-x

Keywords