research-article

t-RELOAD: A REinforcement Learning-based Recommendation for Outcome-driven Application

Authors:

Debanjan Sadhukhan,

Swarit Sankule,

Tridib MukherjeeAuthors Info & Claims

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

Article No.: 28, Pages 1 - 7

https://doi.org/10.1145/3639856.3639884

Published: 17 May 2024 Publication History

Abstract

Games of skill provide an excellent source of entertainment to realize self-esteem, relaxation and social gratification. Engagement in online skill gaming platforms is however heavily dependent on the outcomes and experience (e.g., wins/losses). A user can behave differently under different win/loss experience. An intense engagement can lead to potential demotivation and consequential churn, while a lighter engagement can lead to more confidence for longer sustenance—all depending on the outcomes. Generating a relevant recommendation using reinforcement learning (RL) that can also lead to high engagement (both intensity and duration) is non-trivial because: (i) an early exploration through online-RL using a combined multi-objective reward can permanently hurt users; and (ii) a simulation environment to evaluate RL policies is hard to model due to the (unknown) outcome-driven natural volatility in user behaviour. This work addresses the question “how can we leverage off-policy data for recommendation to solve cold-start problem while ensuring reward-driven optimality from platform-perspective in outcome-based applications? ”. We introduce t-RELOAD: A REinforcement Learning-based REcommendation framework for Outcome-driven Application consisting of 3-layer-based architecture: (i) off-policy data-collection (through already deployed solution), (ii) offline training (using relevancy) and, (iii) online exploration with turbo-reward (t-reward, using engagement). We compare the performance of t-RELOAD with an XGBoost-based recommendation system already in-place to capture the effectiveness.

References

[1]

M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement learning based recommender systems: A survey. Comput. Surveys 55, 7 (2022), 1–38.

Digital Library

[2]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.

Digital Library

[3]

Ching-An Cheng, Xinyan Yan, and Byron Boots. 2020. Trajectory-wise control variates for variance reduction in policy gradient methods. In Conference on Robot Learning. PMLR, 1379–1394.

[4]

Zihan Ding, Pablo Hernandez-Leal, Gavin Weiguang Ding, Changjian Li, and Ruitong Huang. 2020. Cdt: Cascading decision trees for explainable reinforcement learning. arXiv preprint arXiv:2011.07553 (2020).

[5]

Jianqing Fan, Zhaoran Wang, Yuchen Xie, and Zhuoran Yang. 2020. A theoretical analysis of deep Q-learning. In Learning for Dynamics and Control. PMLR, 486–489.

[6]

Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In International Conference on Machine Learning. PMLR, 1447–1456.

[7]

Scott Fitzgerald. 2019. Over-the-top video services in India: Media imperialism after globalization. Media Industries Journal 6, 2 (2019), 00–00.

[8]

Chris Fregly and Antje Barth. 2021. Data Science on AWS. " O’Reilly Media, Inc.".

[9]

Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.

[10]

Rong Gao, Haifeng Xia, Jing Li, Donghua Liu, Shuai Chen, and Gang Chun. 2019. DRCGR: Deep reinforcement learning framework incorporating CNN and GAN-based for interactive recommendation. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 1048–1053.

[11]

Ivo Grondman, Lucian Busoniu, Gabriel AD Lopes, and Robert Babuska. 2012. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 6 (2012), 1291–1307.

Digital Library

[12]

Minghao Han, Lixian Zhang, Jun Wang, and Wei Pan. 2020. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robotics and Automation Letters 5, 4 (2020), 6217–6224.

[13]

Jiawei Huang and Nan Jiang. 2020. From importance sampling to doubly robust policy gradient. In International Conference on Machine Learning. PMLR, 4434–4443.

[14]

Bibhudutta Jena. 2019. An Approach for Forecast Prediction in Data Analytics Field by Tableau Software.International Journal of Information Engineering & Electronic Business 11, 1 (2019).

[15]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020).

[16]

Maciej Łuczak. 2016. Hierarchical clustering of time series data with parametric derivative dynamic time warping. Expert Systems with Applications 62 (2016), 116–130.

Digital Library

[17]

Cristian Martín, Peter Langendoerfer, Pouya Soltani Zarrin, Manuel Díaz, and Bartolomé Rubio. 2022. Kafka-ML: Connecting the data stream with ML/AI frameworks. Future Generation Computer Systems 126 (2022), 15–33.

Digital Library

[18]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[19]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.

[20]

Koyel Mukherjee, Deepanshi Seth, Prachi Mittal, Nuthi S Gowtham, Tridib Mukherjee, Dattatreya Biswas, and Sanjay Agrawal. 2022. ComParE: A User Behavior Centric Framework for Personalized Recommendations in Skill Gaming Platforms. In 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD). 186–194.

[21]

Abiodun E Onile, Ram Machlev, Eduard Petlenkov, Yoash Levron, and Juri Belikov. 2021. Uses of the digital twins concept for energy services, intelligent recommendation systems, and demand side management: A review. Energy Reports 7 (2021), 997–1015.

[22]

Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. 2021. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR) 54, 5 (2021), 1–35.

Digital Library

[23]

Doina Precup. 2000. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series (2000), 80.

Digital Library

[24]

Aleksandrs Slivkins 2019. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning 12, 1-2 (2019), 1–286.

[25]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[26]

Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.

[27]

Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudık. 2017. Optimal and adaptive off-policy evaluation in contextual bandits. In International Conference on Machine Learning. PMLR, 3589–3597.

[28]

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.

[29]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3 (1992), 229–256.

[30]

Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. 1201–1208.

Digital Library

[31]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040–1048.

Digital Library

[32]

Jichen Zhu and Santiago Ontañón. 2020. Player-centered AI for automatic game personalization: Open problems. In International Conference on the Foundations of Digital Games. 1–8.

Digital Library

Cited By

Sadhukhan DSeth DAgrawal SMukherjee TSerra ESpezzano F(2024)EFfECT-RL: Enabling Framework for Establishing Causality and Triggering engagement through RLProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680058(4836-4843)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680058

Index Terms

t-RELOAD: A REinforcement Learning-based Recommendation for Outcome-driven Application
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Modern recommender systems aim to improve user experience. As reinforcement learning (RL) naturally fits this objective---maximizing an user's reward per session---it has become an emerging topic in recommender systems. Developing RL-based ...
Study on recommendation of personalised learning resources based on deep reinforcement learning

In order to overcome the problems of the traditional network personalised learning resource recommendation methods, such as low recommendation accuracy, poor recommendation quality and poor F1 comprehensive evaluation index, a network personalised ...
Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation
ICCSE '21: 5th International Conference on Crowd Science and Engineering

The interactive recommendation systems have recently attracted lots of research attentions, because they can model the interactions between the user and the recommender system. Most previous interactive recommendation methods only focus on optimizing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

October 2023

381 pages

ISBN:9798400716492

DOI:10.1145/3639856

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

AIMLSystems 2023

AIMLSystems 2023: The Third International Conference on Artificial Intelligence and Machine Learning Systems

October 25 - 28, 2023

Bangalore, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
16
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sadhukhan DSeth DAgrawal SMukherjee TSerra ESpezzano F(2024)EFfECT-RL: Enabling Framework for Establishing Causality and Triggering engagement through RLProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680058(4836-4843)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680058

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten