research-article

Open access

Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions

Authors:

Harrie Oosterhuis,

Maarten de RijkeAuthors Info & Claims

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Pages 463 - 471

https://doi.org/10.1145/3437963.3441794

Published: 08 March 2021 Publication History

Abstract

Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions.

We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.

References

[1]

Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019 a. A General Framework for Counterfactual Learning-to-Rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 5--14.

Digital Library

[2]

Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork. 2019 b. Addressing Trust Bias for Unbiased Learning-to-Rank. In The World Wide Web Conference. ACM, 4--14.

[3]

Aman Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Li, Marc Najork, and Thorsten Joachims. 2019 c. Estimating Position Bias without Intrusive Interventions. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 474--482.

Digital Library

[4]

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of the 41st International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 385--394.

Digital Library

[5]

Wolf-Tilo Balke, Ulrich Güntzer, and Werner Kießling. 2002. On Real-Time Top k Querying for Mobile Services. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 125--143.

[6]

Christopher J.C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. Technical Report MSR-TR-2010--82. Microsoft.

[7]

Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 10, 4 (2016), 273--363.

Digital Library

[8]

Ben Carterette and Praveen Chandar. 2018. Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 705--714.

[9]

Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research, Vol. 14 (2011), 1--24.

Digital Library

[10]

Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 39--46.

Digital Library

[11]

Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.

[12]

Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing Historical Interaction Data for Faster Online Learning to Rank for IR. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 183--192.

Digital Library

[13]

Ziniu Hu, Yang Wang, Qu Peng, and Hang Li. 2019. Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm. In The World Wide Web Conference. ACM, 2830--2836.

[14]

Neil Hurley and Mi Zhang. 2011. Novelty and Diversity in Top-n Recommendation--Analysis and Evaluation. ACM Transactions on Internet Technology (TOIT), Vol. 10, 4 (2011), 14.

Digital Library

[15]

Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions. In Proceedings of the 42nd International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 15--24.

Digital Library

[16]

Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142.

Digital Library

[17]

Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately Interpreting Clickthrough Data as Implicit Feedback. In SIGIR. ACM, 154--161.

[18]

Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 781--789.

Digital Library

[19]

Junpei Komiyama, Junya Honda, and Hiroshi Nakagawa. 2015. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15). JMLR.org, 1152--1161.

[20]

Paul Lagrée, Claire Vernade, and Olivier Cappé. 2016. Multiple-play Bandits in the Position-based Model. In Advances in Neural Information Processing Systems. 1597--1605.

[21]

Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, and Zheng Wen. 2018. Offline Evaluation of Ranking Policies with Click Models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1685--1694.

Digital Library

[22]

Qiang Liu, Lihong Li, Ziyang Tang, and Dengyong Zhou. 2018. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation. In Advances in Neural Information Processing Systems. 5356--5366.

[23]

Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 3, 3 (2009), 225--331.

Digital Library

[24]

Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable Unbiased Online Learning to Rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1293--1302.

Digital Library

[25]

Harrie Oosterhuis and Maarten de Rijke. 2019. Optimizing Ranking Models in an Online Setting. In Advances in Information Retrieval. Springer International Publishing, Cham, 382--396.

[26]

Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 datasets. arXiv preprint arXiv:1306.2597 (2013).

[27]

Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations As Treatments: Debiasing Learning and Evaluation. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1670--1679.

Digital Library

[28]

Igor Shalyminov, Ondvr ej Duvs ek, and Oliver Lemon. 2018. Neural Response Ranking for Social Conversation: A Data-Efficient Approach. In Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd Int'l Workshop on Search-Oriented Conversational AI. 1--8.

[29]

Adith Swaminathan and Thorsten Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems. 3231--3239.

[30]

Vladimir Vapnik. 2013. The Nature of Statistical Learning Theory .Springer Science & Business Media.

Digital Library

[31]

Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg. 2011. Monitoring Reverse Top-k Queries over Mobile Devices. In Proceedings of the 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access. ACM, 17--24.

Digital Library

[32]

Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 115--124.

Digital Library

[33]

Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018a. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 610--618.

Digital Library

[34]

Xuanhui Wang, Cheng Li, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2018b. The LambdaLoss Framework for Ranking Metric Optimization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1313--1322.

Digital Library

[35]

Yisong Yue and Thorsten Joachims. 2009. Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1201--1208.

Digital Library

Cited By

Ovaisi ZSaadatpanah PSefati SOhannessian MZheleva E(2024)Fairness of Interaction in Ranking under Position, Selection, and Trust BiasACM Transactions on Recommender Systems10.1145/36528643:2(1-28)Online publication date: 6-Apr-2024
https://dl.acm.org/doi/10.1145/3652864
Jeunen OOosterhuis HSaito YVasile FWang Y(2024)CONSEQUENCES --- The 3rd Workshop on Causality, Counterfactuals and Sequential Decision-Making for Recommender SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3687095(1206-1209)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3687095
Gupta SOosterhuis Hde Rijke MSerra ESpezzano F(2024)Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to RankProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679531(737-747)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679531
Show More Cited By

Index Terms

Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Policy-Aware Unbiased Learning to Rank for Top-k Rankings
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently ...
Counterfactual Online Learning to Rank
Advances in Information Retrieval
Abstract
Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from ...
How do Online Learning to Rank Methods Adapt to Changes of Intent?
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Online learning to rank (OLTR) uses interaction data, such as clicks, to dynamically update rankers. OLTR has been thought to capture user intent change overtime - a task that is impossible for rankers trained on statistic datasets such as in offline and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

March 2021

1192 pages

ISBN:9781450382977

DOI:10.1145/3437963

General Chairs:
Liane Lewin-Eytan
Amazon, Israel
,
David Carmel
Amazon, Israel
,
Elad Yom-Tov
Microsoft, Israel
,
Program Chairs:
Eugene Agichtein
Emory University and Amazon, USA
,
Evgeniy Gabrilovich
Google Health, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Funding Sources

Nederlandse Organisatie voor Wetenschappelijk Onderzoek

Conference

WSDM '21

Sponsor:

WSDM '21: The Fourteenth ACM International Conference on Web Search and Data Mining

March 8 - 12, 2021

Virtual Event, Israel

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
1,068
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)26

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ovaisi ZSaadatpanah PSefati SOhannessian MZheleva E(2024)Fairness of Interaction in Ranking under Position, Selection, and Trust BiasACM Transactions on Recommender Systems10.1145/36528643:2(1-28)Online publication date: 6-Apr-2024
https://dl.acm.org/doi/10.1145/3652864
Jeunen OOosterhuis HSaito YVasile FWang Y(2024)CONSEQUENCES --- The 3rd Workshop on Causality, Counterfactuals and Sequential Decision-Making for Recommender SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3687095(1206-1209)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3687095
Gupta SOosterhuis Hde Rijke MSerra ESpezzano F(2024)Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to RankProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679531(737-747)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679531
Hager PDeffayet RRenders JZoeter Ode Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657892(1546-1556)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657892
Gorantla SBhansali EDeshpande ALouis AHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Optimizing Learning-to-Rank Models for Ex-Post Fair RelevanceProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657751(1525-1534)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657751
Gupta SHager PHuang JVardasbi AOosterhuis HAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Unbiased Learning to Rank: On Recent Advances and Practical ApplicationsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3636451(1118-1121)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3636451
Yang THan CLuo CGupta PPhillips JAi QChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645487
Khan SUgander J(2023)Adaptive normalization for IPW estimationJournal of Causal Inference10.1515/jci-2022-001911:1Online publication date: 8-Feb-2023
https://doi.org/10.1515/jci-2022-0019
Gupta SHager POosterhuis H(2023)Recent Advancements in Unbiased Learning to RankProceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3632754.3632942(145-148)Online publication date: 15-Dec-2023
https://dl.acm.org/doi/10.1145/3632754.3632942
Zhao HXu JZhang XCai GDong ZWen J(2023)Unbiased Top-$k$ Learning to Rank with Causal Likelihood DecompositionProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625340(129-138)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625340
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents