Proximal policy optimization based hybrid recommender systems for large scale recommendations

Padhye, Vaibhav; Lakshmanan, Kailasam; Chaturvedi, Amrita

doi:10.1007/s11042-022-14231-x

Proximal policy optimization based hybrid recommender systems for large scale recommendations

Published: 15 December 2022

Volume 82, pages 20079–20100, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Vaibhav Padhye¹,
Kailasam Lakshmanan¹ &
Amrita Chaturvedi¹

515 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Recommender systems have become increasingly popular due to the significant rise in digital information over the internet in recent users. They help provide personalized recommendations to the user by selecting a few items out of a large set of items. However, with the growing size of item space and users, scalability remains a key issue for recommender systems. However, most existing policy gradient approaches in recommendations suffer from high variance leading to an increase in instability during the learning process. Policy Gradient Algorithms such as PPO are proven to be effective in large action spaces (a large number of items) as they learn the optimal policy directly from the samples. We use the PPO algorithm to train our Reinforcement Learning agent modeling the collaborative filtering process as a Markov Decision Process. PPO utilizes the actor-critic framework and thus mitigates the high variance in Policy Gradient Algorithms. Further, we address the cold start issue in Collaborative filtering with autoencoder-based content filtering. Proximal Policy Optimization (PPO) methods are today considered among the most effective reinforcement learning methods, achieving state-of-the-art performance and even outperforming Deep Q learning methods. In this paper, we propose a switching hybrid recommender system using the two different recommender system techniques. A switching hybrid system can switch between recommendation techniques depending on some criterion and can tackle its constituent recommender system’s shortfall using the other counterpart in a particular situation. We show that our method outperforms various baseline methods on the popular Movielens datasets for different evaluation metrics. On Movielens 1m, our method outperforms the baseline by 9.19% in terms of R@10 and 3.86% and 6.58% in terms of P@10 and P@20, respectively. For the Movielens 100k dataset, our method improves on the baseline methods by 4.10% in terms of P@10 and 3.90% and 2.40% in terms of R@10 and R@20.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Session-aware recommender system using double deep reinforcement learning

Article 01 November 2023

DRL-HIFA: a dynamic recommendation system with deep reinforcement learning based Hidden Markov Weight Updation and factor analysis

Article 01 March 2024

A Deep Reinforcement Learning Real-Time Recommendation Model Based on Long and Short-Term Preference

Article Open access 21 January 2023

Data Availability

All the data required for this research work, i.e., Movielens 1M and Movielens 100k dataset, is available publicly on the Movielens Website. The data can be accessed from https://grouplens.org/datasets/movielens/.

References

Akerkar B, Sajja P (2010) Knowledge-based systems. MIT press Cambridge, 978-0763776473
Aljunid MF, Manjaiah DH (2020) An efficient deep learning approach for collaborative filtering recommender system procedia computer science. Third International Conference on Computing and Network Communications (CoCoNet’19) 171:829–836
Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) A brief survey of deep reinforcement learning. IEEE Signal Proc Mag 2:26–38
Article Google Scholar
Bhatti UA, Huang M, Wang H, Zhang Y, Mehmood A, Wu D (2018) Recommendation system for immunization coverage and monitoring. Human Vaccines & Immunotherapeutics, 165–171
Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Information Systems, 329–351
Breese JS, Heckerman D, Kadie C (2013) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52, UAI’98, Madison, Wisconsin
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai Gym. arXiv:1606.01540
Burke R (2002) Hybrid recommender systems: survey and experiments. User Modeling and User-adapted Interaction 12:331–370
Article MATH Google Scholar
Chen H (2021) A DQN-based Recommender System for Item-list Recommendation. In: IEEE International Conference on Big Data (Big Data), pp 5699–5702
Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi Ed (2019) Top-K Off-Policy Correction for a REINFORCE Recommender System, Association for Computing Machinery, New York, NY, USA, 456–464
Chen X, Yao L, McAuley J, Zhou G, Wang X (2021) A survey of deep reinforcement learning in recommender systems: a systematic review and future directions. arXiv:2109.03540
Dulac-Arnold G, Evans R, Hasselt HV, Sunehag P, Lillicrap T, Hunt J, Mann T, Weber T, Degris T, Coppin B (2015) Reinforcement learning in large discrete action spaces. Corr, abs/1512.07679
He X, Li L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, pp 173–182
Hu Y, Da Q, Zeng A, Yu Y, Xu Y (2018) Reinforcement learning to rank in e-commerce search engine Formalization, analysis, and application. Corr, abs/1803.00710
Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Association for computing machinery, New York, USA, pp. 426–434
Li L, Chu W, Langford J, Chapire RE (2010) A contextual-bandit approach to personalized news article recommendation. J Mach Learn Res, 661–670
Li W, Zhou X, Shimizu S, Xin M, Jiang J, Gao H, Jin Q (2019) Personalization recommendation algorithm based on trust correlation degree and matrix factorization. IEEE Access 7:45451–45459
Article Google Scholar
Lin W, Zhang X, Qi L, Li W, Li S, Sheng VS, Nepal S (2021) Location-Aware Service recommendations with Privacy-Preservation in the internet of things. IEEE Trans Comput Social Syst 8:227–235
Article Google Scholar
Liu F, Tang R, Li X, Zhang W, Ye Y, Chen H, Guo H, Zhang Y (2018) Deep reinforcement learning based recommenda- tion with explicit user-item interactions modeling, arXiv:1810.12027
Liu Y, Wang S, Khan MS, He J (2018) A novel deep hybrid recommender system based on auto-encoder with neural collaborative filtering. Big Data Mining and Analytics, 211–221
Marlin B (2003) Modeling User Rating Profiles for Collaborative Filtering. MIT Press. Cambridge, MA, USA
Mnih A, Salakhutdinov R (2008) Probabilistic matrix factorization. NIPS, 1257–1264
Mnih A, Salakhutdinov R (2008) Bayesian probabilistic matrix factorization using markov chain monte carlo. ICML, pp. 880–887
Pan F, Cai Q, Tang P, Zhuang F, He Q (2019) Policy Gradients for Contextual Recommendations. In: Association for Computing Machinery, New York, NY, USA, 1421–1431
Polat H, Du W (2005) SVD-based collaborative filtering with privacy. Association for Computing Machinery, 791–795, New York, NY, USA
Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N (2021) Stable-Baselines3 reliable reinforcement learning implementations. J Mach Learn Res 22:1–8
MATH Google Scholar
Salakhutdinov R, Mnih A, Hinton G (2007) Restricted Boltzmann machines for collaborative filtering. NIPS, 791–798
Sarwar B, Karypis G, Konstan J, Riedl J (2017) Item-based collaborative filtering recommendation algorithms. arXiv:1707.06347
Schulman J, Wolski P, Dhariwal P, Radford A, Klimo O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: Autoencoders meet collaborative filtering. In: Proceedings of the 24th International Conference on World Wide Web, WWW ’15, Companion, New York, NY. USA, 111–112
Shani G, Heckerman D, Brafman I (2005) An mdp-based recommender system. J Mach Learn Res 15324435:1265–1295
MathSciNet MATH Google Scholar
Singh M (2020) Scalability and sparsity issues in recommender datasets: a survey. Knowl Inf Syst, 1–43
Srivihok A, Sukonmanee P (2005) E-Commerce intelligent agent personalization travel support agent using q learning. In: Association for Computing Machinery, New York, NY USA
Sutton R (1998) Reinforcement learning: an introduction. vol. 1, no. 1., MIT press Cambridge
Sutton R, Singh S, McAllester D (2000) Comparing policy-gradient algorithms. IEEE Transactions on Systems Man, and Cybernetics
Taghipour N, Kardan A (2008) A hybrid web recommender system based on q-learning. In: Proceedings of the ACM Symposium on Applied Computing (SAC), Fortaleza, Ceara, Brazil, pp. 1164–1168, March, 16–20, 2008
Tao Y, Wang C, Yao L, Li W, Yu Y (2021) Item trend learning for sequential recommendation system using gated graph neural network. Neural Comput & Applic, 1–16
Van Hasselt H, Doron Y, Strub F, Hessel M, Sonnerat N, Modayil J (2018) Deep reinforcement learning and the deadly triad, arXiv:1812.02648
Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. NIPS, 2643–2651
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 1096–1103
Vozalis M, Margaritis K (2004) Collaborative filtering enhanced by Demographic Correleation
Vozalis M, Margaritis K (2006) On the enhancement of collaborative filtering by demographic data. Web Intelligence and Agent Systems 2:117–138
Google Scholar
Wang H, Wang N, Yeung DY (2015) Collaborative deep learning for recommender systems. KDD, 1235–1244
Wei K, Huang J, Fu S (2007) A survey of e-commerce recommender systems. In: 2007 International conference on service systems and service management. pp 1–5
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bulletin, 80–83
Wu Y, DuBois C, Zheng AX, Ester M (2016) Collaborative denoising auto-encoders for top-N recommender systems. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp 153–162
Wu C, Rajeswaran A, Duan Y, Kumar V, Bayen AM, Kakade S, Mordatch I, Abbeel P (2018) Variance reduction for policy gradient with action-dependent factorized baselines, arXiv:1803.07246
Xue H, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems, IJCAI 3203–3209, 17, Melbourne, Australia
Zhang S, Yao L, Sun A, Tay Y (2019) Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Comput. Surv., 52
Zhao X, Xia L, Zhang L, Ding Z, Yin D, Tang J (2018) Deep reinforcement learning for page-wise recommendations, abs/1801.00209
Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018) Recommendations with negative feedback via pairwise deep reinforcement learning. Corr2, abs/1802.06501
Zheng G, Zhang F, Zheng Z, Xiang Y, Nicholas J, Xie X, Li ZDRN (2018) A deep reinforcement learning framework for news recommendation. International World Wide Web Conferences Steering Committee 2:167–176
Google Scholar
Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li ZDRN (2018) DRN: a deep reinforcement learning framework for news recommendation. WWW 2018, Lyon, France, April, 23-27, 167–176
Zou L, Xia L, Du P, Zhang Z, Bai T, Liu W, Nie JY, Yin D (2020) Pseudo Dyna-Q: a reinforcement learning framework for interactive recommendation. In: Association for computing machinery, New York, NY, USA, pp. 816–824
Zou L, Xia L, Gu Y, Zhao X, Liu W, Huang J, Yin D (2020) Neural interactive collaborative filtering. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 749–758

Download references

Author information

Authors and Affiliations

I.I.T BHU, Varanasi BHU, Varanasi, India
Vaibhav Padhye, Kailasam Lakshmanan & Amrita Chaturvedi

Authors

Vaibhav Padhye
View author publications
You can also search for this author in PubMed Google Scholar
Kailasam Lakshmanan
View author publications
You can also search for this author in PubMed Google Scholar
Amrita Chaturvedi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vaibhav Padhye.

Ethics declarations

Conflict of Interests

All authors declare that they have no conflicts of interest and this research received no funding.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Padhye, V., Lakshmanan, K. & Chaturvedi, A. Proximal policy optimization based hybrid recommender systems for large scale recommendations. Multimed Tools Appl 82, 20079–20100 (2023). https://doi.org/10.1007/s11042-022-14231-x

Download citation

Received: 19 March 2022
Revised: 20 June 2022
Accepted: 04 November 2022
Published: 15 December 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11042-022-14231-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proximal policy optimization based hybrid recommender systems for large scale recommendations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Session-aware recommender system using double deep reinforcement learning

DRL-HIFA: a dynamic recommendation system with deep reinforcement learning based Hidden Markov Weight Updation and factor analysis

A Deep Reinforcement Learning Real-Time Recommendation Model Based on Long and Short-Term Preference

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Proximal policy optimization based hybrid recommender systems for large scale recommendations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Session-aware recommender system using double deep reinforcement learning

DRL-HIFA: a dynamic recommendation system with deep reinforcement learning based Hidden Markov Weight Updation and factor analysis

A Deep Reinforcement Learning Real-Time Recommendation Model Based on Long and Short-Term Preference

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation