Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3437963.3441794acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions

Published: 08 March 2021 Publication History

Abstract

Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions.
We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.

References

[1]
Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019 a. A General Framework for Counterfactual Learning-to-Rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 5--14.
[2]
Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork. 2019 b. Addressing Trust Bias for Unbiased Learning-to-Rank. In The World Wide Web Conference. ACM, 4--14.
[3]
Aman Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Li, Marc Najork, and Thorsten Joachims. 2019 c. Estimating Position Bias without Intrusive Interventions. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 474--482.
[4]
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of the 41st International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 385--394.
[5]
Wolf-Tilo Balke, Ulrich Güntzer, and Werner Kießling. 2002. On Real-Time Top k Querying for Mobile Services. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 125--143.
[6]
Christopher J.C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. Technical Report MSR-TR-2010--82. Microsoft.
[7]
Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 10, 4 (2016), 273--363.
[8]
Ben Carterette and Praveen Chandar. 2018. Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 705--714.
[9]
Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research, Vol. 14 (2011), 1--24.
[10]
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 39--46.
[11]
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.
[12]
Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. 2013. Reusing Historical Interaction Data for Faster Online Learning to Rank for IR. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 183--192.
[13]
Ziniu Hu, Yang Wang, Qu Peng, and Hang Li. 2019. Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm. In The World Wide Web Conference. ACM, 2830--2836.
[14]
Neil Hurley and Mi Zhang. 2011. Novelty and Diversity in Top-n Recommendation--Analysis and Evaluation. ACM Transactions on Internet Technology (TOIT), Vol. 10, 4 (2011), 14.
[15]
Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions. In Proceedings of the 42nd International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 15--24.
[16]
Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142.
[17]
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately Interpreting Clickthrough Data as Implicit Feedback. In SIGIR. ACM, 154--161.
[18]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 781--789.
[19]
Junpei Komiyama, Junya Honda, and Hiroshi Nakagawa. 2015. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15). JMLR.org, 1152--1161.
[20]
Paul Lagrée, Claire Vernade, and Olivier Cappé. 2016. Multiple-play Bandits in the Position-based Model. In Advances in Neural Information Processing Systems. 1597--1605.
[21]
Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, and Zheng Wen. 2018. Offline Evaluation of Ranking Policies with Click Models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1685--1694.
[22]
Qiang Liu, Lihong Li, Ziyang Tang, and Dengyong Zhou. 2018. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation. In Advances in Neural Information Processing Systems. 5356--5366.
[23]
Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval, Vol. 3, 3 (2009), 225--331.
[24]
Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable Unbiased Online Learning to Rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1293--1302.
[25]
Harrie Oosterhuis and Maarten de Rijke. 2019. Optimizing Ranking Models in an Online Setting. In Advances in Information Retrieval. Springer International Publishing, Cham, 382--396.
[26]
Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 datasets. arXiv preprint arXiv:1306.2597 (2013).
[27]
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations As Treatments: Debiasing Learning and Evaluation. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1670--1679.
[28]
Igor Shalyminov, Ondvr ej Duvs ek, and Oliver Lemon. 2018. Neural Response Ranking for Social Conversation: A Data-Efficient Approach. In Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd Int'l Workshop on Search-Oriented Conversational AI. 1--8.
[29]
Adith Swaminathan and Thorsten Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems. 3231--3239.
[30]
Vladimir Vapnik. 2013. The Nature of Statistical Learning Theory .Springer Science & Business Media.
[31]
Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg. 2011. Monitoring Reverse Top-k Queries over Mobile Devices. In Proceedings of the 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access. ACM, 17--24.
[32]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 115--124.
[33]
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018a. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 610--618.
[34]
Xuanhui Wang, Cheng Li, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2018b. The LambdaLoss Framework for Ranking Metric Optimization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1313--1322.
[35]
Yisong Yue and Thorsten Joachims. 2009. Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 1201--1208.

Cited By

View all
  • (2024)Fairness of Interaction in Ranking under Position, Selection, and Trust BiasACM Transactions on Recommender Systems10.1145/36528643:2(1-28)Online publication date: 6-Apr-2024
  • (2024)CONSEQUENCES --- The 3rd Workshop on Causality, Counterfactuals and Sequential Decision-Making for Recommender SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3687095(1206-1209)Online publication date: 8-Oct-2024
  • (2024)Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to RankProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679531(737-747)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining
    March 2021
    1192 pages
    ISBN:9781450382977
    DOI:10.1145/3437963
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Paper

    Author Tags

    1. counterfactual estimation
    2. counterfactual learning to rank
    3. learning to rank
    4. online learning to rank
    5. unbiased learning to rank
    6. user interactions

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WSDM '21

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)161
    • Downloads (Last 6 weeks)26
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Fairness of Interaction in Ranking under Position, Selection, and Trust BiasACM Transactions on Recommender Systems10.1145/36528643:2(1-28)Online publication date: 6-Apr-2024
    • (2024)CONSEQUENCES --- The 3rd Workshop on Causality, Counterfactuals and Sequential Decision-Making for Recommender SystemsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3687095(1206-1209)Online publication date: 8-Oct-2024
    • (2024)Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to RankProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679531(737-747)Online publication date: 21-Oct-2024
    • (2024)Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657892(1546-1556)Online publication date: 10-Jul-2024
    • (2024)Optimizing Learning-to-Rank Models for Ex-Post Fair RelevanceProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657751(1525-1534)Online publication date: 10-Jul-2024
    • (2024)Unbiased Learning to Rank: On Recent Advances and Practical ApplicationsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3636451(1118-1121)Online publication date: 4-Mar-2024
    • (2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
    • (2023)Adaptive normalization for IPW estimationJournal of Causal Inference10.1515/jci-2022-001911:1Online publication date: 8-Feb-2023
    • (2023)Recent Advancements in Unbiased Learning to RankProceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3632754.3632942(145-148)Online publication date: 15-Dec-2023
    • (2023)Unbiased Top-$k$ Learning to Rank with Causal Likelihood DecompositionProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625340(129-138)Online publication date: 26-Nov-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media