research-article

Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems

Authors:

Harrie Oosterhuis,

Maarten de Rijke,

Herke van HoofAuthors Info & Claims

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

Pages 190 - 199

https://doi.org/10.1145/3383313.3412252

Published: 22 September 2020 Publication History

Abstract

Reinforcement learning for recommendation (RL4Rec) methods are increasingly receiving attention as an effective way to improve long-term user engagement. However, applying RL4Rec online comes with risks: exploration may lead to periods of detrimental user experience. Moreover, few researchers have access to real-world recommender systems. Simulations have been put forward as a solution where user feedback is simulated based on logged historical user data, thus enabling optimization and evaluation without being run online. While simulators do not risk the user experience and are widely accessible, we identify an important limitation of existing simulation methods. They ignore the interaction biases present in logged user data, and consequently, these biases affect the resulting simulation. As a solution to this issue, we introduce a debiasing step in the simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. While our debiasing methods can be applied to any simulator, we make our complete pipeline publicly available as the Simulator for OFfline leArning and evaluation (SOFA): the first simulator that accounts for interaction biases prior to optimization and evaluation.

References

[1]

Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research 47 (2013), 253–279.

[2]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540(2016).

[3]

Rocío Cañamares and Pablo Castells. 2018. Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 415–424.

Digital Library

[4]

Haokun Chen, Xinyi Dai, Han Cai, Weinan Zhang, Xuejian Wang, Ruiming Tang, Yuzhou Zhang, and Yong Yu. 2019. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3312–3320.

Digital Library

[5]

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k Off-policy Correction for a REINFORCE Recommender System. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 456–464.

Digital Library

[6]

Ruey-Cheng Chen, Qingyao Ai, Gaya Jayasinghe, and W Bruce Croft. 2019. Correcting for Recency Bias in Job Recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 2185–2188.

Digital Library

[7]

Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1187–1196.

Digital Library

[8]

Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In International Conference on Machine Learning. 1052–1061.

[9]

Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo Ha, and Sungroh Yoon. 2018. Reinforcement Learning Based Recommender System Using Biclustering Technique. arXiv preprint arXiv:1801.05532(2018).

[10]

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep Reinforcement Learning in Large Discrete Action Spaces. arXiv preprint arXiv:1512.07679(2015).

[11]

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 198–206.

Digital Library

[12]

José Miguel Hernández-Lobato, Neil Houlsby, and Zoubin Ghahramani. 2014. Probabilistic Matrix Factorization with Non-random Missing Data. In International Conference on Machine Learning. 1512–1520.

[13]

Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. arXiv preprint arXiv:1909.04847(2019).

[14]

Guido W Imbens and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.

Digital Library

[15]

Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 781–789.

Digital Library

[16]

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A Contextual-bandit Approach to Personalized News Article Recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.

Digital Library

[17]

Lihong Li, Jin Young Kim, and Imed Zitouni. 2015. Toward Predicting the Outcome of an A/B Experiment for Search Relevance. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. 37–46.

Digital Library

[18]

Roderick JA Little and Donald B Rubin. 2019. Statistical Analysis with Missing Data. Vol. 793. John Wiley & Sons.

[19]

Benjamin M Marlin and Richard S Zemel. 2009. Collaborative Prediction and Ranking with Non-random Missing Data. In Proceedings of the third ACM conference on Recommender systems. ACM, 5–12.

Digital Library

[20]

Benjamin M Marlin, Richard S Zemel, Sam Roweis, and Malcolm Slaney. 2007. Collaborative Filtering and the Missing at Random Assumption. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence. 267–275.

Digital Library

[21]

Bruno Pradel, Nicolas Usunier, and Patrick Gallinari. 2012. Ranking with Non-random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 147–154.

Digital Library

[22]

David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv preprint arXiv:1808.00720(2018).

[23]

Paul R Rosenbaum. 2002. Overt Bias in Observational Studies. In Observational Studies. Springer, 71–104.

[24]

Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. In International Conference on Machine Learning. 1670–1679.

[25]

Bichen Shi, Makbule Gulcin Ozsoy, Neil Hurley, Barry Smyth, Elias Z Tragos, James Geraci, and Aonghus Lawlor. 2019. PyRecGym: A Reinforcement Learning Gym for Recommender Systems. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 491–495.

Digital Library

[26]

Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2019. Virtual-taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902–4909.

Digital Library

[27]

Thiago Silveira, Min Zhang, Xiao Lin, Yiqun Liu, and Shaoping Ma. 2019. How Good Your Recommender System Is? A Survey on Evaluations in Recommendation. International Journal of Machine Learning and Cybernetics 10, 5(2019), 813–831.

[28]

Harald Steck. 2010. Training and Testing of Recommender Systems on Data Missing Not at Random. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 713–722.

Digital Library

[29]

Harald Steck. 2011. Item Popularity and Recommendation Accuracy. In Proceedings of the fifth ACM conference on Recommender systems. ACM, 125–132.

Digital Library

[30]

Harald Steck. 2013. Evaluation of Recommendations: Rating-prediction and Ranking. In Proceedings of the 7th ACM conference on Recommender systems. 213–220.

Digital Library

[31]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.

Digital Library

[32]

Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2447–2456.

Digital Library

[33]

Xiaojie Wang, Rui Zhang, Yu Sun, and Jianzhong Qi. 2019. Doubly Robust Joint Learning for Recommendation on Data Missing not at Random. In International Conference on Machine Learning. 6638–6647.

[34]

Jing Zhang, Bowen Hao, Bo Chen, Cuiping Li, Hong Chen, and Jimeng Sun. 2019. Hierarchical Reinforcement Learning for Course Recommendation in MOOCs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 435–442.

Digital Library

[35]

Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. arXiv preprint arXiv:2006.08732(2020).

[36]

Xiangyu Zhao, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2019. Toward Simulating Environments in Reinforcement Learning based Recommendations. arXiv preprint arXiv:1906.11462(2019).

[37]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep Reinforcement Learning for Page-wise Recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems. 95–103.

Digital Library

[38]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1040–1048.

Digital Library

[39]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Dawei Yin, Yihong Zhao, and Jiliang Tang. 2017. Deep Reinforcement Learning for List-wise Recommendations. arXiv preprint arXiv:1801.00209(2017).

[40]

Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, and Jiliang Tang. 2020. Jointly Learning to Recommend and Advertise. arXiv preprint arXiv:2003.00097(2020).

[41]

Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In Proceedings of the 2018 World Wide Web Conference. 167–176.

Digital Library

[42]

Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems. arXiv preprint arXiv:1902.05570(2019).

Cited By

Azzopardi LClarke CKantor PMitra BTrippas JRen ZAliannejadi MArabzadeh NChandrasekar Rde Rijke MEustratiadis PHersh WHuang JKanoulas EKareem JLi YLupart SMekonnen KRoegiest ASoboroff ISilvestri FVerberne SVos DYang EZhao Y(2024)Report on the Search Futures Workshop at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728858:1(1-41)Online publication date: 7-Aug-2024
https://dl.acm.org/doi/10.1145/3687273.3687288
Deffayet RThonet THwang DLehoux VRenders Jde Rijke M(2024)SARDINE: Simulator for Automated Recommendation in Dynamic and Interactive EnvironmentsACM Transactions on Recommender Systems10.1145/36564812:3(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656481
Shirokikh MShenbin IAlekseev AVolodkevich AVasilev ASavchenko ANikolenko SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Neural Click Models for Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657939(2553-2558)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657939
Show More Cited By

Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Recommendations

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Personalization of playlists is a common feature in music streaming services, but conventional techniques, such as collaborative filtering, rely on explicit assumptions regarding content quality to learn how to make recommendations. Such assumptions ...
Exploring potential biases towards blockbuster items in ranking-based recommendations
Abstract
Popularity bias is defined as the intrinsic tendency of recommendation algorithms to feature popular items more than unpopular ones in the ranked lists lists they produced. When investigating the adverse effects of popularity bias, the literature ...
Choice models and recommender systems effects on users’ choices
Abstract
Nowadays, the users of a web platform, such as a video-on-demand service or an eCommerce site, are routinely using the platform’s recommender system (RS) when choosing which item to consume or buy (e.g. movies or books). It is therefore important ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

September 2020

796 pages

ISBN:9781450375832

DOI:10.1145/3383313

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

RecSys '20

Sponsor:

RecSys '20: Fourteenth ACM Conference on Recommender Systems

September 22 - 26, 2020

Virtual Event, Brazil

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
1,454
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)13

Reflects downloads up to 18 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Azzopardi LClarke CKantor PMitra BTrippas JRen ZAliannejadi MArabzadeh NChandrasekar Rde Rijke MEustratiadis PHersh WHuang JKanoulas EKareem JLi YLupart SMekonnen KRoegiest ASoboroff ISilvestri FVerberne SVos DYang EZhao Y(2024)Report on the Search Futures Workshop at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728858:1(1-41)Online publication date: 7-Aug-2024
https://dl.acm.org/doi/10.1145/3687273.3687288
Deffayet RThonet THwang DLehoux VRenders Jde Rijke M(2024)SARDINE: Simulator for Automated Recommendation in Dynamic and Interactive EnvironmentsACM Transactions on Recommender Systems10.1145/36564812:3(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656481
Shirokikh MShenbin IAlekseev AVolodkevich AVasilev ASavchenko ANikolenko SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Neural Click Models for Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657939(2553-2558)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657939
Yu YGao CChen JTang HSun YChen QMa WZhang MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657868(977-987)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657868
Huang JOosterhuis HMansoury Mvan Hoof Hde Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657749(416-426)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657749
Zhang GLi DGu HLu TShang LGu N(2024)Simulating News Recommendation Ecosystems for Insights and ImplicationsIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.338132911:5(5699-5713)Online publication date: Oct-2024
https://doi.org/10.1109/TCSS.2024.3381329
Eberhard LPopova KWalk SHelic D(2024)Computing recommendations from free-form textExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121268236:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121268
Zhao KLiu SCai QZhao XLiu ZZheng DJiang PGai KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)KuaiSimProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668067(44880-44897)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668067
Jannach DAbdollahpouri H(2023)A survey on multi-objective recommender systemsFrontiers in Big Data10.3389/fdata.2023.11578996Online publication date: 22-Mar-2023
https://doi.org/10.3389/fdata.2023.1157899
Pawłowska JRydzewska KWierzbicki A(2023)Using Cognitive Models to Understand and Counteract the Effect of Self-Induced Bias on Recommendation AlgorithmsJournal of Artificial Intelligence and Soft Computing Research10.2478/jaiscr-2023-000813:2(73-94)Online publication date: 11-Mar-2023
https://doi.org/10.2478/jaiscr-2023-0008
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents