Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3639856.3639884acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

t-RELOAD: A REinforcement Learning-based Recommendation for Outcome-driven Application

Published: 17 May 2024 Publication History

Abstract

Games of skill provide an excellent source of entertainment to realize self-esteem, relaxation and social gratification. Engagement in online skill gaming platforms is however heavily dependent on the outcomes and experience (e.g., wins/losses). A user can behave differently under different win/loss experience. An intense engagement can lead to potential demotivation and consequential churn, while a lighter engagement can lead to more confidence for longer sustenance—all depending on the outcomes. Generating a relevant recommendation using reinforcement learning (RL) that can also lead to high engagement (both intensity and duration) is non-trivial because: (i) an early exploration through online-RL using a combined multi-objective reward can permanently hurt users; and (ii) a simulation environment to evaluate RL policies is hard to model due to the (unknown) outcome-driven natural volatility in user behaviour. This work addresses the question “how can we leverage off-policy data for recommendation to solve cold-start problem while ensuring reward-driven optimality from platform-perspective in outcome-based applications? ”. We introduce t-RELOAD: A REinforcement Learning-based REcommendation framework for Outcome-driven Application consisting of 3-layer-based architecture: (i) off-policy data-collection (through already deployed solution), (ii) offline training (using relevancy) and, (iii) online exploration with turbo-reward (t-reward, using engagement). We compare the performance of t-RELOAD with an XGBoost-based recommendation system already in-place to capture the effectiveness.

References

[1]
M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement learning based recommender systems: A survey. Comput. Surveys 55, 7 (2022), 1–38.
[2]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
[3]
Ching-An Cheng, Xinyan Yan, and Byron Boots. 2020. Trajectory-wise control variates for variance reduction in policy gradient methods. In Conference on Robot Learning. PMLR, 1379–1394.
[4]
Zihan Ding, Pablo Hernandez-Leal, Gavin Weiguang Ding, Changjian Li, and Ruitong Huang. 2020. Cdt: Cascading decision trees for explainable reinforcement learning. arXiv preprint arXiv:2011.07553 (2020).
[5]
Jianqing Fan, Zhaoran Wang, Yuchen Xie, and Zhuoran Yang. 2020. A theoretical analysis of deep Q-learning. In Learning for Dynamics and Control. PMLR, 486–489.
[6]
Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In International Conference on Machine Learning. PMLR, 1447–1456.
[7]
Scott Fitzgerald. 2019. Over-the-top video services in India: Media imperialism after globalization. Media Industries Journal 6, 2 (2019), 00–00.
[8]
Chris Fregly and Antje Barth. 2021. Data Science on AWS. " O’Reilly Media, Inc.".
[9]
Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.
[10]
Rong Gao, Haifeng Xia, Jing Li, Donghua Liu, Shuai Chen, and Gang Chun. 2019. DRCGR: Deep reinforcement learning framework incorporating CNN and GAN-based for interactive recommendation. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 1048–1053.
[11]
Ivo Grondman, Lucian Busoniu, Gabriel AD Lopes, and Robert Babuska. 2012. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 6 (2012), 1291–1307.
[12]
Minghao Han, Lixian Zhang, Jun Wang, and Wei Pan. 2020. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robotics and Automation Letters 5, 4 (2020), 6217–6224.
[13]
Jiawei Huang and Nan Jiang. 2020. From importance sampling to doubly robust policy gradient. In International Conference on Machine Learning. PMLR, 4434–4443.
[14]
Bibhudutta Jena. 2019. An Approach for Forecast Prediction in Data Analytics Field by Tableau Software.International Journal of Information Engineering & Electronic Business 11, 1 (2019).
[15]
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020).
[16]
Maciej Łuczak. 2016. Hierarchical clustering of time series data with parametric derivative dynamic time warping. Expert Systems with Applications 62 (2016), 116–130.
[17]
Cristian Martín, Peter Langendoerfer, Pouya Soltani Zarrin, Manuel Díaz, and Bartolomé Rubio. 2022. Kafka-ML: Connecting the data stream with ML/AI frameworks. Future Generation Computer Systems 126 (2022), 15–33.
[18]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[19]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
[20]
Koyel Mukherjee, Deepanshi Seth, Prachi Mittal, Nuthi S Gowtham, Tridib Mukherjee, Dattatreya Biswas, and Sanjay Agrawal. 2022. ComParE: A User Behavior Centric Framework for Personalized Recommendations in Skill Gaming Platforms. In 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD). 186–194.
[21]
Abiodun E Onile, Ram Machlev, Eduard Petlenkov, Yoash Levron, and Juri Belikov. 2021. Uses of the digital twins concept for energy services, intelligent recommendation systems, and demand side management: A review. Energy Reports 7 (2021), 997–1015.
[22]
Shubham Pateria, Budhitama Subagdja, Ah-hwee Tan, and Chai Quek. 2021. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR) 54, 5 (2021), 1–35.
[23]
Doina Precup. 2000. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series (2000), 80.
[24]
Aleksandrs Slivkins 2019. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning 12, 1-2 (2019), 1–286.
[25]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[26]
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
[27]
Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudık. 2017. Optimal and adaptive off-policy evaluation in contextual bandits. In International Conference on Machine Learning. PMLR, 3589–3597.
[28]
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.
[29]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3 (1992), 229–256.
[30]
Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. 1201–1208.
[31]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040–1048.
[32]
Jichen Zhu and Santiago Ontañón. 2020. Player-centered AI for automatic game personalization: Open problems. In International Conference on the Foundations of Digital Games. 1–8.

Cited By

View all
  • (2024)EFfECT-RL: Enabling Framework for Establishing Causality and Triggering engagement through RLProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680058(4836-4843)Online publication date: 21-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems
October 2023
381 pages
ISBN:9798400716492
DOI:10.1145/3639856
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIMLSystems 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EFfECT-RL: Enabling Framework for Establishing Causality and Triggering engagement through RLProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680058(4836-4843)Online publication date: 21-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media