research-article

Free access

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Authors:

Haruka Kiyohara,

Masatoshi Uehara,

Nobuyuki Shimizu,

Yasuo Yamamoto,

Yuta SaitoAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 1154 - 1163

https://doi.org/10.1145/3580305.3599447

Published: 04 August 2023 Publication History

Abstract

Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous studies assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to identify the appropriate user behavior model to minimize the mean squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments demonstrate that the empirical accuracy improvement can be significant, enabling effective OPE of ranking systems even under diverse user behavior.

Supplementary Material

MP4 File (IDapfp0123-2min-promo.mp4)

Short promotion video.

Download
26.34 MB

References

[1]

Susan Athey and Guido Imbens. 2016. Recursive Partitioning for Heterogeneous Causal Effects. Proceedings of the National Academy of Sciences, Vol. 113, 27 (2016), 7353--7360.

[2]

Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. 2017. Classification and regression trees. Routledge.

[3]

Olivier Chapelle and Ya Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In Proceedings of the 18th International Conference on World Wide Web. 1--10.

Digital Library

[4]

Jia Chen, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. A Context-Aware Click Model for Web Search. In Proceedings of the 13th International Conference on Web Search and Data Mining. 88--96.

Digital Library

[5]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Synthesis lectures on information concepts, retrieval, and services, Vol. 7, 3 (2015), 1--115.

[6]

Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-Bias Models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.

Digital Library

[7]

Lifang Deng, Jin Niu, Angulia Yang, Qidi Xu, Xiang Fu, Jiandong Zhang, and Anxiang Zeng. 2020. Hybrid Interest Modeling for Long-tailed Users. arXiv preprint arXiv:2012.14770 (2020).

[8]

Maria Dimakopoulou, Nikos Vlassis, and Tony Jebara. 2019. Marginal Posterior Sampling for Slate Bandits. In IJCAI. 2223--2229.

[9]

Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (Bellevue, Washington, USA) (ICML'11). Omnipress, Madison, WI, USA, 1097--1104.

Digital Library

[10]

Georges E Dupret and Benjamin Piwowarski. 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 331--338.

Digital Library

[11]

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 198--206.

Digital Library

[12]

Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient Multiple-Click Models in Web Search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 124--131.

Digital Library

[13]

Botao Hu, Yuchen Zhang, Weizhu Chen, Gang Wang, and Qiang Yang. 2011. Characterizing Search Intent Diversity into Click Models. In Proceedings of the 20th International Conference on World Wide Web. 17--26.

Digital Library

[14]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.

Digital Library

[15]

Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 781--789.

Digital Library

[16]

Nathan Kallus, Yuta Saito, and Masatoshi Uehara. 2021. Optimal Off-Policy Evaluation from Multiple Logging Policies. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 5247--5256.

[17]

Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, and Emma Brunskill. 2022. Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation. In Proceedings of the Conference on Health, Inference, and Learning, Vol. 174. 397--410.

[18]

Haruka Kiyohara, Kosuke Kawakami, and Yuta Saito. 2021. Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation. arXiv preprint arXiv:2109.08331 (2021).

[19]

Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, and Yasuo Yamamoto. 2022. Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 487--497.

Digital Library

[20]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv preprint arXiv:2005.01643 (2020).

[21]

Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms, In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. WSDM, 297--306.

Digital Library

[22]

Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S Muthukrishnan, Vishwa Vinay, and Zheng Wen. 2018. Offline Evaluation of Ranking Policies with Click Models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1685--1694.

Digital Library

[23]

Jiaxin Mao, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Constructing Click Models for Mobile Search. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 775--784.

[24]

James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Benjamin Carterette. 2020. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1779--1788.

Digital Library

[25]

Doina Precup, Richard S. Sutton, and Satinder P. Singh. 2000. Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the 17th International Conference on Machine Learning. 759--766.

Digital Library

[26]

Yuta Saito. 2020. Unbiased Pairwise Learning from Biased Implicit Feedback. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval. 5--12.

Digital Library

[27]

Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. 2020a. Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation. arXiv preprint arXiv:2008.07146 (2020).

[28]

Yuta Saito and Thorsten Joachims. 2021. Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances. In Proceedings of the 15th ACM Conference on Recommender Systems. 828--830.

Digital Library

[29]

Yuta Saito and Thorsten Joachims. 2022. Off-Policy Evaluation for Large Action Spaces via Embeddings. In Proceedings of the 39th International Conference on Machine Learning. 19089--19122.

[30]

Yuta Saito, Gota Morisihta, and Shota Yasui. 2020b. Dual learning algorithm for delayed conversions. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1849--1852.

Digital Library

[31]

Yuta Saito, Qingyang Ren, and Thorsten Joachims. 2023. Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling. arXiv preprint arXiv:2305.08062 (2023).

[32]

Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, and Kei Tateno. 2021. Evaluating the Robustness of Off-Policy Evaluation. In Proceedings of the 15th ACM Conference on Recommender Systems. 114--123.

Digital Library

[33]

Yuta Saito, Suguru Yaginuma, Yuta Nishino, Hayato Sakata, and Kazuhide Nakata. 2020c. Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining. 501--509.

Digital Library

[34]

Alex Strehl, John Langford, Lihong Li, and Sham M Kakade. 2010. Learning from Logged Implicit Exploration Data. In Advances in Neural Information Processing Systems, Vol. 23. 2217--2225.

[35]

Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudík. 2020. Doubly Robust Off-Policy Evaluation with Shrinkage. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119. PMLR, 9167--9176.

[36]

Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, and Imed Zitouni. 2017. Off-Policy Evaluation for Slate Recommendation. In Advances in Neural Information Processing Systems, Vol. 30. 3632--3642.

[37]

Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, and Kei Tateno. 2022. Policy-Adaptive Estimator Selection for Off-Policy Evaluation. arXiv preprint arXiv:2211.13904 (2022).

[38]

Nikos Vlassis, Fernando Amat Gil, and Ashok Chandrashekar. 2021. Off-Policy Evaluation of Slate Policies under Bayes Risk. arXiv preprint arXiv:2101.02553 (2021).

[39]

Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 115--124.

Digital Library

[40]

Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudik. 2017. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits, In Proceedings of the 34th International Conference on Machine Learning. ICML, 3589--3597.

[41]

Danqing Xu, Yiqun Liu, Min Zhang, Shaoping Ma, and Liyun Ru. 2012. Incorporating Revisiting Behaviors into Click Models. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 303--312.

Digital Library

[42]

Junqi Zhang, Yiqun Liu, Jiaxin Mao, Xiaohui Xie, Min Zhang, Shaoping Ma, and Qi Tian. 2022. Global or Local: Constructing Personalized Click Models for Web Search. In Proceedings of the ACM Web Conference 2022. 213--223.

Digital Library

[43]

Ruizhe Zhang, Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2021. Constructing a Comparison-based Click Model for Web Search. In Proceedings of the Web Conference 2021. 270--283.

Digital Library

Cited By

Shimizu TTanaka KKishimoto RKiyohara HNomura MSaito Y(2024)Effective Off-Policy Evaluation and Learning in Contextual Combinatorial BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688099(733-741)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688099
Saito YAbdollahpouri HAnderton JCarterette BLalmas MChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Long-term Off-Policy Evaluation and LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645446(3432-3443)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645446
Kiyohara HNomura MSaito YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionProceedings of the ACM Web Conference 202410.1145/3589334.3645343(3150-3161)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645343

Index Terms

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval models and ranking

Recommendations

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance ...
Evaluating the Robustness of Off-Policy Evaluation
RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive ...
Off-Policy Evaluation for Large Action Spaces via Policy Convolution
WWW '24: Proceedings of the ACM Web Conference 2024

Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
418
Total Downloads

Downloads (Last 12 months)207
Downloads (Last 6 weeks)12

Reflects downloads up to 04 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shimizu TTanaka KKishimoto RKiyohara HNomura MSaito Y(2024)Effective Off-Policy Evaluation and Learning in Contextual Combinatorial BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688099(733-741)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688099
Saito YAbdollahpouri HAnderton JCarterette BLalmas MChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Long-term Off-Policy Evaluation and LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645446(3432-3443)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645446
Kiyohara HNomura MSaito YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionProceedings of the ACM Web Conference 202410.1145/3589334.3645343(3150-3161)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645343

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten