Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599447acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free access

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Published: 04 August 2023 Publication History

Abstract

Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous studies assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to identify the appropriate user behavior model to minimize the mean squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments demonstrate that the empirical accuracy improvement can be significant, enabling effective OPE of ranking systems even under diverse user behavior.

Supplementary Material

MP4 File (IDapfp0123-2min-promo.mp4)
Short promotion video.

References

[1]
Susan Athey and Guido Imbens. 2016. Recursive Partitioning for Heterogeneous Causal Effects. Proceedings of the National Academy of Sciences, Vol. 113, 27 (2016), 7353--7360.
[2]
Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. 2017. Classification and regression trees. Routledge.
[3]
Olivier Chapelle and Ya Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In Proceedings of the 18th International Conference on World Wide Web. 1--10.
[4]
Jia Chen, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. A Context-Aware Click Model for Web Search. In Proceedings of the 13th International Conference on Web Search and Data Mining. 88--96.
[5]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Synthesis lectures on information concepts, retrieval, and services, Vol. 7, 3 (2015), 1--115.
[6]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-Bias Models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.
[7]
Lifang Deng, Jin Niu, Angulia Yang, Qidi Xu, Xiang Fu, Jiandong Zhang, and Anxiang Zeng. 2020. Hybrid Interest Modeling for Long-tailed Users. arXiv preprint arXiv:2012.14770 (2020).
[8]
Maria Dimakopoulou, Nikos Vlassis, and Tony Jebara. 2019. Marginal Posterior Sampling for Slate Bandits. In IJCAI. 2223--2229.
[9]
Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (Bellevue, Washington, USA) (ICML'11). Omnipress, Madison, WI, USA, 1097--1104.
[10]
Georges E Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 331--338.
[11]
Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 198--206.
[12]
Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient Multiple-Click Models in Web Search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 124--131.
[13]
Botao Hu, Yuchen Zhang, Weizhu Chen, Gang Wang, and Qiang Yang. 2011. Characterizing Search Intent Diversity into Click Models. In Proceedings of the 20th International Conference on World Wide Web. 17--26.
[14]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.
[15]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 781--789.
[16]
Nathan Kallus, Yuta Saito, and Masatoshi Uehara. 2021. Optimal Off-Policy Evaluation from Multiple Logging Policies. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 5247--5256.
[17]
Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, and Emma Brunskill. 2022. Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation. In Proceedings of the Conference on Health, Inference, and Learning, Vol. 174. 397--410.
[18]
Haruka Kiyohara, Kosuke Kawakami, and Yuta Saito. 2021. Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation. arXiv preprint arXiv:2109.08331 (2021).
[19]
Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, and Yasuo Yamamoto. 2022. Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 487--497.
[20]
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv preprint arXiv:2005.01643 (2020).
[21]
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms, In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. WSDM, 297--306.
[22]
Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S Muthukrishnan, Vishwa Vinay, and Zheng Wen. 2018. Offline Evaluation of Ranking Policies with Click Models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1685--1694.
[23]
Jiaxin Mao, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Constructing Click Models for Mobile Search. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 775--784.
[24]
James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Benjamin Carterette. 2020. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1779--1788.
[25]
Doina Precup, Richard S. Sutton, and Satinder P. Singh. 2000. Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the 17th International Conference on Machine Learning. 759--766.
[26]
Yuta Saito. 2020. Unbiased Pairwise Learning from Biased Implicit Feedback. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval. 5--12.
[27]
Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. 2020a. Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation. arXiv preprint arXiv:2008.07146 (2020).
[28]
Yuta Saito and Thorsten Joachims. 2021. Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances. In Proceedings of the 15th ACM Conference on Recommender Systems. 828--830.
[29]
Yuta Saito and Thorsten Joachims. 2022. Off-Policy Evaluation for Large Action Spaces via Embeddings. In Proceedings of the 39th International Conference on Machine Learning. 19089--19122.
[30]
Yuta Saito, Gota Morisihta, and Shota Yasui. 2020b. Dual learning algorithm for delayed conversions. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1849--1852.
[31]
Yuta Saito, Qingyang Ren, and Thorsten Joachims. 2023. Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling. arXiv preprint arXiv:2305.08062 (2023).
[32]
Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, and Kei Tateno. 2021. Evaluating the Robustness of Off-Policy Evaluation. In Proceedings of the 15th ACM Conference on Recommender Systems. 114--123.
[33]
Yuta Saito, Suguru Yaginuma, Yuta Nishino, Hayato Sakata, and Kazuhide Nakata. 2020c. Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining. 501--509.
[34]
Alex Strehl, John Langford, Lihong Li, and Sham M Kakade. 2010. Learning from Logged Implicit Exploration Data. In Advances in Neural Information Processing Systems, Vol. 23. 2217--2225.
[35]
Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudík. 2020. Doubly Robust Off-Policy Evaluation with Shrinkage. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119. PMLR, 9167--9176.
[36]
Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, and Imed Zitouni. 2017. Off-Policy Evaluation for Slate Recommendation. In Advances in Neural Information Processing Systems, Vol. 30. 3632--3642.
[37]
Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, and Kei Tateno. 2022. Policy-Adaptive Estimator Selection for Off-Policy Evaluation. arXiv preprint arXiv:2211.13904 (2022).
[38]
Nikos Vlassis, Fernando Amat Gil, and Ashok Chandrashekar. 2021. Off-Policy Evaluation of Slate Policies under Bayes Risk. arXiv preprint arXiv:2101.02553 (2021).
[39]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 115--124.
[40]
Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudik. 2017. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits, In Proceedings of the 34th International Conference on Machine Learning. ICML, 3589--3597.
[41]
Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorporating Revisiting Behaviors into Click Models. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 303--312.
[42]
Junqi Zhang, Yiqun Liu, Jiaxin Mao, Xiaohui Xie, Min Zhang, Shaoping Ma, and Qi Tian. 2022. Global or Local: Constructing Personalized Click Models for Web Search. In Proceedings of the ACM Web Conference 2022. 213--223.
[43]
Ruizhe Zhang, Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2021. Constructing a Comparison-based Click Model for Web Search. In Proceedings of the Web Conference 2021. 270--283.

Cited By

View all
  • (2024)Effective Off-Policy Evaluation and Learning in Contextual Combinatorial BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688099(733-741)Online publication date: 8-Oct-2024
  • (2024)Long-term Off-Policy Evaluation and LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645446(3432-3443)Online publication date: 13-May-2024
  • (2024)Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionProceedings of the ACM Web Conference 202410.1145/3589334.3645343(3150-3161)Online publication date: 13-May-2024

Index Terms

  1. Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2023
      5996 pages
      ISBN:9798400701030
      DOI:10.1145/3580305
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 August 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. inverse propensity score.
      2. off-policy evaluation
      3. ranking policy

      Qualifiers

      • Research-article

      Conference

      KDD '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)207
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 04 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Effective Off-Policy Evaluation and Learning in Contextual Combinatorial BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688099(733-741)Online publication date: 8-Oct-2024
      • (2024)Long-term Off-Policy Evaluation and LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645446(3432-3443)Online publication date: 13-May-2024
      • (2024)Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionProceedings of the ACM Web Conference 202410.1145/3589334.3645343(3150-3161)Online publication date: 13-May-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media