research-article

Open access

Unbiased Learning-to-Rank Needs Unconfounded Propensity Estimation

Authors:

Brian D. DavisonAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1535 - 1545

https://doi.org/10.1145/3626772.3657772

Published: 11 July 2024 Publication History

Abstract

The logs of the use of a search engine provide sufficient data to train a better ranker. However, it is well known that such implicit feedback reflects biases, and in particular a presentation bias that favors higher-ranked results. Unbiased Learning-to-Rank (ULTR) methods attempt to optimize performance by jointly modeling this bias along with the ranker so that the bias can be removed. Such methods have been shown to provide theoretical soundness, and promise superior performance and low deployment costs. However, existing ULTR methods don't recognize that query-document relevance is a confounder -- it affects both the likelihood of a result being clicked because of relevance and the likelihood of the result being ranked high by the base ranker. Moreover, the performance guarantees of existing ULTR methods assume the use of a weak ranker -- one that does a poor job of ranking documents based on relevance to a query. In practice, of course, commercial search engines use highly tuned rankers, and desire to improve upon them using the implicit judgments in search logs. This results in a significant correlation between position and relevance, which leads existing ULTR methods to overestimate click propensities in highly ranked results, reducing ULTR's effectiveness. This paper is the first to demonstrate the problem of propensity overestimation by ULTR algorithms, based on a causal analysis. We develop a new learning objective based on a backdoor adjustment. In addition, we introduce the Logging-Policy-aware Propensity (LPP) model that can jointly learn LPP and a more accurate ranker. We extensively test our approach on two public benchmark tasks and show that our proposal is effective, practical and significantly outperforms the state of the art.

References

[1]

Qingyao Ai, Keping Bi, Jiafeng Guo, and W. Bruce Croft. 2018a. Learning a Deep Listwise Context Model for Ranking Refinement. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2019.

[2]

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W. Bruce Croft. 2018b. Unbiased Learning to Rank with Unbiased Propensity Estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2019.

[3]

Qingyao Ai, Tao Yang, Huazheng Wang, and Jiaxin Mao. 2021. Unbiased Learning to Rank: Online or Offline? ACM Trans. Inf. Syst., Vol. 39, 2 (2021), 21:1--21:29.

Digital Library

[4]

Yinqiong Cai, Jiafeng Guo, Yixing Fan, Qingyao Ai, Ruqing Zhang, and Xueqi Cheng. 2022. Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17--21, 2022.

Digital Library

[5]

Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the Yahoo! Learning to Rank Challenge, held at ICML 2010, Haifa, Israel, June 25, 2010.

[6]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2--6, 2009.

Digital Library

[7]

Mouxiang Chen, Chenghao Liu, Zemin Liu, and Jianling Sun. 2022a. LBD: Decouple Relevance and Observation for Individual-Level Unbiased Learning to Rank. In Advances in Neural Information Processing Systems, Vol. 35.

[8]

Mouxiang Chen, Chenghao Liu, Zemin Liu, and Jianling Sun. 2022b. Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank. In KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, Aidong Zhang and Huzefa Rangwala (Eds.). ACM, 136--145.

Digital Library

[9]

Mouxiang Chen, Chenghao Liu, Jianling Sun, and Steven C. H. Hoi. 2021. Adapting Interactional Observation Embedding for Counterfactual Learning to Rank. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021.

Digital Library

[10]

Konstantina Christakopoulou, Madeleine Traverse, Trevor Potter, Emma Marriott, Daniel Li, Chris Haulk, Ed H. Chi, and Minmin Chen. 2020. Deconfounding User Satisfaction Estimation from Response Rate Bias. In RecSys 2020: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 22--26, 2020.

[11]

Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2016. Click Models for Web Search and their Applications to IR: WSDM 2016 Tutorial. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, February 22--25, 2016.

Digital Library

[12]

Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings.

[13]

Nick Craswell, Onno Zoeter, Michael J. Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the International Conference on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA, February 11--12, 2009.

Digital Library

[14]

Romain Deffayet, Philipp Hager, Jean-Michel Renders, and Maarten de Rijke. 2023 a. An Offline Metric for the Debiasedness of Click Models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23--27, 2023. ACM, 558--568.

Digital Library

[15]

Romain Deffayet, Jean-Michel Renders, and Maarten de Rijke. 2023 b. Evaluating the Robustness of Click Models to Policy Distributional Shift. ACM Trans. Inf. Syst., Vol. 41, 4 (2023), 84:1--84:28.

[16]

John C. Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. (2011).

[17]

Priyanka Gupta, Ankit Sharma, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2021. CauSeR: Causal Session-based Recommendations for Handling Popularity Bias. In CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021.

[18]

Ziniu Hu, Yang Wang, Qu Peng, and Hang Li. 2019. Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13--17, 2019.

[19]

Kalervo J"a rvelin and Jaana Kek"a l"a inen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (2002).

[20]

Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002, Edmonton, Alberta, Canada. ACM, 133--142.

Digital Library

[21]

Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20--23, 2006.

Digital Library

[22]

Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately interpreting clickthrough data as implicit feedback. In SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 15--19, 2005.

Digital Library

[23]

Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans. Inf. Syst., Vol. 25, 2 (2007), 7.

Digital Library

[24]

Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, February 6--10, 2017.

Digital Library

[25]

Dugang Liu, Pengxiang Cheng, Hong Zhu, Zhenhua Dong, Xiuqiang He, Weike Pan, and Zhong Ming. 2021. Mitigating Confounding Bias in Recommendation via Information Bottleneck. In RecSys '21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021 - 1 October 2021.

Digital Library

[26]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. 2016. Post-Learning Optimization of Tree Ensembles for Efficient Ranking. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016.

Digital Library

[27]

Dan Luo, Lixin Zou, Qingyao Ai, Zhiyu Chen, Dawei Yin, and Brian D. Davison. 2023. Model-based Unbiased Learning to Rank. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023. ACM, 895--903.

[28]

Haitao Mao, Lixin Zou, Yujia Zheng, Jiliang Tang, Xiaokai Chu, Jiashu Zhao, and Dawei Yin. 2022. Whole Page Unbiased Learning to Rank. arXiv preprint arXiv:2210.10718 (2022).

[29]

Jiaxin Mao, Zhumin Chu, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating the Reliability of Click Models. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2019, Santa Clara, CA, USA, October 2--5, 2019.

Digital Library

[30]

Jiaxin Mao, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Constructing Click Models for Mobile Search. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2019.

[31]

Judea Pearl et al. 2000. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, Vol. 19 (2000), 2.

[32]

Yi Ren, Hongyan Tang, and Siwen Zhu. 2022. Unbiased Learning to Rank with Biased Continuous Feedback. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17--21, 2022. ACM, 1716--1725.

Digital Library

[33]

Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8--12, 2007.

Digital Library

[34]

Masahiro Sato, Sho Takemori, Janmajay Singh, and Tomoko Ohkuma. 2020. Unbiased Learning for the Causal Effect of Recommendation. In RecSys 2020: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 22--26, 2020.

Digital Library

[35]

Mark D. Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6--10, 2007.

Digital Library

[36]

Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., Vol. 15, 1 (2014), 1929--1958.

Digital Library

[37]

Anh Tran, Tao Yang, and Qingyao Ai. 2021. ULTRA: An Unbiased Learning To Rank Algorithm Toolbox. In CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021.

[38]

Ali Vardasbi, Maarten de Rijke, and Ilya Markov. 2021. Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank. In CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021. ACM, 1869--1878.

Digital Library

[39]

Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19--23, 2020.

Digital Library

[40]

Nan Wang, Zhen Qin, Xuanhui Wang, and Hongning Wang. 2021b. Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction. In WSDM '21, The Fourteenth ACM International Conference on Web Search and Data Mining, Virtual Event, Israel, March 8--12, 2021. ACM, 481--489.

Digital Library

[41]

Wenjie Wang, Fuli Feng, Xiangnan He, Xiang Wang, and Tat-Seng Chua. 2021a. Deconfounded Recommendation for Alleviating Bias Amplification. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021.

[42]

Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016.

Digital Library

[43]

Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5--9, 2019.

Digital Library

[44]

Yixin Wang, Dawen Liang, Laurent Charlin, and David M. Blei. 2020. Causal Inference for Recommender Systems. In RecSys 2020: Fourteenth ACM Conference on Recommender Systems, Virtual Event, Brazil, September 22--26, 2020.

[45]

Le Yan, Zhen Qin, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, and Marc Najork. 2022. Revisiting Two-tower Models for Unbiased Learning to Rank. In SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022.

Digital Library

[46]

Tao Yang, Shikai Fang, Shibo Li, Yulan Wang, and Qingyao Ai. 2020. Analysis of Multivariate Scoring Functions for Automatic Unbiased Learning to Rank. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19--23, 2020. 2277--2280.

Digital Library

[47]

Tao Yang, Cuize Han, Chen Luo, Parth Gupta, Jeff M. Phillips, and Qingyao Ai. 2023. Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach. CoRR, Vol. abs/2305.16606 (2023).

[48]

Tao Yang, Chen Luo, Hanqing Lu, Parth Gupta, Bing Yin, and Qingyao Ai. 2022. Can Clicks Be Both Labels and Features?: Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank. In SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022. ACM, 6--17.

Digital Library

[49]

Xun Yang, Fuli Feng, Wei Ji, Meng Wang, and Tat-Seng Chua. 2021. Deconfounded Video Moment Retrieval with Causal Intervention. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021.

[50]

Ruohan Zhan, Changhua Pei, Qiang Su, Jianfeng Wen, Xueliang Wang, Guanyu Mu, Dong Zheng, Peng Jiang, and Kun Gai. 2022. Deconfounding Duration Bias in Watch-time Prediction for Video Recommendation. In KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. 4472--4481.

[51]

Junqi Zhang, Jiaxin Mao, Yiqun Liu, Ruizhe Zhang, Min Zhang, Shaoping Ma, Jun Xu, and Qi Tian. 2019. Context-Aware Ranking by Constructing a Virtual Environment for Reinforcement Learning. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3--7, 2019.

Digital Library

[52]

Yang Zhang, Fuli Feng, Xiangnan He, Tianxin Wei, Chonggang Song, Guohui Ling, and Yongdong Zhang. 2021. Causal Intervention for Leveraging Popularity Bias in Recommendation. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021.

[53]

Yunan Zhang, Le Yan, Zhen Qin, Honglei Zhuang, Jiaming Shen, Xuanhui Wang, Michael Bendersky, and Marc Najork. 2023. Towards Disentangling Relevance and Bias in Unbiased Learning to Rank. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6--10, 2023. ACM, 5618--5627.

Digital Library

[54]

Honglei Zhuang, Zhen Qin, Xuanhui Wang, Michael Bendersky, Xinyu Qian, Po Hu, and Dan Chary Chen. 2021. Cross-Positional Attention for Debiasing Clicks. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19--23, 2021.

[55]

Lixin Zou, Changying Hao, Hengyi Cai, Shuaiqiang Wang, Suqi Cheng, Zhicong Cheng, Wenwen Ye, Simiu Gu, and Dawei Yin. 2022a. Approximated doubly robust search relevance estimation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3756--3765.

Digital Library

[56]

Lixin Zou, Haitao Mao, Xiaokai Chu, Jiliang Tang, Wenwen Ye, Shuaiqiang Wang, and Dawei Yin. 2022b. A large scale search dataset for unbiased learning to rank. Advances in Neural Information Processing Systems, Vol. 35 (2022), 1127--1139.

[57]

Lixin Zou, Shengqiang Zhang, Hengyi Cai, Dehong Ma, Suqi Cheng, Shuaiqiang Wang, Daiting Shi, Zhicong Cheng, and Dawei Yin. 2021. Pre-trained language model based ranking in Baidu search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 4014--4022.

Digital Library

Cited By

Bai TChen WYang CShi C(2025)Invariant debiasing learning for recommendation via biased imputationInformation Processing & Management10.1016/j.ipm.2024.10402862:3(104028)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104028

Index Terms

Unbiased Learning-to-Rank Needs Unconfounded Propensity Estimation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Model-based Unbiased Learning to Rank
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Unbiased Learning to Rank(ULTR), i.e., learning to rank documents with biased user feedback data, is a well-known challenge in information retrieval. Existing methods in unbiased learning to rank typically rely on click modeling or inverse propensity ...
Unbiased Learning to Rank: Theory and Practice
ICTIR '18: Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval

Implicit user feedback (such as clicks and dwell time) is an important source of data for modern search engines. While heavily biased~\citejoachims2005accurately,keane2006modeling,joachims2007evaluating,yue2010beyond, it is cheap to collect and ...
Unbiased Learning to Rank with Unbiased Propensity Estimation
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
461
Total Downloads

Downloads (Last 12 months)461
Downloads (Last 6 weeks)126

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bai TChen WYang CShi C(2025)Invariant debiasing learning for recommendation via biased imputationInformation Processing & Management10.1016/j.ipm.2024.10402862:3(104028)Online publication date: May-2025
https://doi.org/10.1016/j.ipm.2024.104028

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents