Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3477495.3531948acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

Can Clicks Be Both Labels and Features?: Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank

Published: 07 July 2022 Publication History

Abstract

Using implicit feedback collected from user clicks as training labels for learning-to-rank algorithms is a well-developed paradigm that has been extensively studied and used in modern IR systems. Using user clicks as ranking features, on the other hand, has not been fully explored in existing literature. Despite its potential in improving short-term system performance, whether the incorporation of user clicks as ranking features is beneficial for learning-to-rank systems in the long term is still questionable. Two of the most important problems are (1) the explicit bias introduced by noisy user behavior, and (2) the implicit bias, which we refer to as the exploitation bias, introduced by the dynamic training and serving of learning-to-rank systems with behavior features. In this paper, we explore the possibility of incorporating user clicks as both training labels and ranking features for learning to rank. We formally investigate the problems in feature collection and model training, and propose a counterfactual feature projection function and a novel uncertainty-aware learning to rank framework. Experiments on public datasets show that ranking models learned with the proposed framework can significantly outperform models built with raw click features and algorithms that rank items without considering model uncertainty.

Supplementary Material

MP4 File (SIGIR22-fp380.mp4)
Presentation video Despite the potential of click features in improving short-term system performance, whether the incorporation of user clicks as ranking features is beneficial for learning-to-rank systems in the long term is still questionable. In this paper, we formally investigate the problems in feature collection and model training and propose a counterfactual feature projection function and a novel uncertainty-aware learning to rank framework.

References

[1]
Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019 a. A general framework for counterfactual learning-to-rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 5--14.
[2]
Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork. 2019 b. Addressing trust bias for unbiased learning-to-rank. In The World Wide Web Conference. 4--14.
[3]
Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 19--26.
[4]
Qingyao Ai, Keping Bi, Jiafeng Guo, and W Bruce Croft. 2018a. Learning a deep listwise context model for ranking refinement. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 135--144.
[5]
Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018b. Unbiased learning to rank with unbiased propensity estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 385--394.
[6]
Qingyao Ai, Jiaxin Mao, Yiqun Liu, and W Bruce Croft. 2018c. Unbiased learning to rank: Theory and practice. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2305--2306.
[7]
Qingyao Ai, Tao Yang, Huazheng Wang, and Jiaxin Mao. 2021. Unbiased Learning to Rank: Online or Offline? ACM Transactions on Information Systems (TOIS), Vol. 39, 2 (2021), 1--29.
[8]
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, Vol. 3, Nov (2002), 397--422.
[9]
ZW Birnbaum and RC McCarty. 1958. A Distribution-Free Upper Confidence Bound Based on Independent Samples of X and Y. The Annals of Mathematical Statistics (1958), 558--562.
[10]
Christine L Borgman. 1983. End user behavior on an online information retrieval system: A computer monitoring study. In Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval. 162--176.
[11]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. 89--96.
[12]
Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning, Vol. 11, 23--581 (2010), 81.
[13]
Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, and Peter Auer. 2011. Upper-confidence-bound algorithms for active learning in multi-armed bandits. In International Conference on Algorithmic Learning Theory. Springer, 189--203.
[14]
Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge. PMLR, 1--24.
[15]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.
[16]
Merlise Clyde and Edward I George. 2004. Model uncertainty. Statistical science, Vol. 19, 1 (2004), 81--94.
[17]
Daniel Cohen, Bhaskar Mitra, Oleg Lesota, Navid Rekabsaz, and Carsten Eickhoff. 2021. Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models. arXiv preprint arXiv:2105.04651 (2021).
[18]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.
[19]
J Shane Culpepper, Charles LA Clarke, and Jimmy Lin. 2016. Dynamic cutoff prediction in multi-stage retrieval systems. In Proceedings of the 21st Australasian Document Computing Symposium. 17--24.
[20]
Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM Transactions on Information Systems (TOIS), Vol. 35, 2 (2016), 1--31.
[21]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[22]
Thomas G Dietterich and Eun Bae Kong. 1995. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical Report. Citeseer.
[23]
David Draper. 1995. Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 57, 1 (1995), 45--70.
[24]
Allen L Edwards. 1985. Multiple regression and the analysis of variance and covariance. WH Freeman/Times Books/Henry Holt & Co.
[25]
Nicola Ferro, Claudio Lucchese, Maria Maistro, and Raffaele Perego. 2017. On including the user dynamic in learning to rank. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1041--1044.
[26]
Nicola Ferro, Claudio Lucchese, Maria Maistro, and Raffaele Perego. 2020. Boosting learning to rank with user dynamics and continuation methods. Information Retrieval Journal, Vol. 23, 6 (2020), 528--554.
[27]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[28]
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050--1059.
[29]
Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory. Springer, 174--188.
[30]
Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2020. A deep look into neural ranking models for information retrieval. Information Processing & Management, Vol. 57, 6 (2020), 102067.
[31]
Parth Gupta, Tommaso Dreossi, Jan Bakus, Yu-Hsiang Lin, and Vamsi Salaka. 2020. Treating Cold Start in Product Search by Priors. Association for Computing Machinery, New York, NY, USA, 77--78. https://doi.org/10.1145/3366424.3382705
[32]
Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, Vol. 16, 1 (2013), 63--90.
[33]
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2017a. Accurately interpreting clickthrough data as implicit feedback. In ACM SIGIR Forum, Vol. 51. Acm New York, NY, USA, 4--11.
[34]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017b. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781--789.
[35]
Yen-Chieh Lien, Daniel Cohen, and W Bruce Croft. 2019. An assumption-free approach to the dynamic truncation of ranked lists. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 79--82.
[36]
Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, Vol. 3, 3 (2009), 225--331.
[37]
Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2012. On the usefulness of query features for learning to rank. In Proceedings of the 21st ACM international conference on Information and knowledge management. 2559--2562.
[38]
Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1293--1302.
[39]
Harrie Oosterhuis and Maarten de Rijke. 2020. Policy-aware unbiased learning to rank for top-k rankings. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 489--498.
[40]
Harrie Oosterhuis and Maarten de Rijke. 2021. Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 463--471.
[41]
Zohreh Ovaisi, Ragib Ahsan, Yifan Zhang, Kathryn Vasilaky, and Elena Zheleva. 2020. Correcting for selection bias in learning-to-rank systems. In Proceedings of The Web Conference 2020. 1863--1873.
[42]
Gustavo Penha and Claudia Hauff. 2021. On the Calibration and Uncertainty of Neural Learning to Rank Models for Conversational Search. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 160--170.
[43]
Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 datasets. arXiv preprint arXiv:1306.2597 (2013).
[44]
Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374.
[45]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR'94. Springer, 232--241.
[46]
Haggai Roitman, Shai Erera, and Bar Weiner. 2017. Robust standard deviation estimation for query performance prediction. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 245--248.
[47]
Mohd Sami. [n.,d.]. Upper Confidence Bound Algorithm in Reinforcement Learning. sl https://www.geeksforgeeks.org/upper-confidence-bound-algorithm-in-reinforcement-learning/.
[48]
Anne Schuth, Robert-Jan Bruintjes, Fritjof Buüttner, Joost van Doorn, Carla Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bas Veeling, Jos van der Velde, Roger Wechsler, et almbox. 2015. Probabilistic multileave for online retrieval evaluation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 955--958.
[49]
Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke. 2016. Multileave gradient descent for fast online learning to rank. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 457--466.
[50]
Anna Shtok, Oren Kurland, David Carmel, Fiana Raiber, and Gad Markovits. 2012. Predicting query performance by query-drift estimation. ACM Transactions on Information Systems (TOIS), Vol. 30, 2 (2012), 1--35.
[51]
Mark D Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. 623--632.
[52]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, Vol. 15, 1 (2014), 1929--1958.
[53]
Anh Tran, Tao Yang, and Qingyao Ai. 2021. ULTRA: An Unbiased Learning To Rank Algorithm Toolbox. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4613--4622.
[54]
Arif Usta, Ismail Sengor Altingovde, Rifat Ozcan, and Ozgur Ulusoy. 2021. Learning to Rank for Educational Search Engines. IEEE Transactions on Learning Technologies (2021).
[55]
Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1475--1484.
[56]
Huazheng Wang, Yiling Jia, and Hongning Wang. 2021. Interactive Information Retrieval with Bandit Feedback. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2658--2661.
[57]
Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, and Hongning Wang. 2019. Variance reduction in gradient exploration for online learning to rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 835--844.
[58]
Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook, and Hongning Wang. 2018. Efficient exploration of gradient space for online learning to rank. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 145--154.
[59]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 115--124.
[60]
Ryen White. 2013. Beliefs and biases in web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 3--12.
[61]
Ryen W White, Ian Ruthven, and Joemon M Jose. 2002. The use of implicit evidence for relevance feedback in web retrieval. In European Conference on Information Retrieval. Springer, 93--109.
[62]
Tao Yang, Shikai Fang, Shibo Li, Yulan Wang, and Qingyao Ai. 2020. Analysis of multivariate scoring functions for automatic unbiased learning to rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2277--2280.
[63]
Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. 1201--1208.
[64]
Jianhan Zhu, Jun Wang, Michael Taylor, and Ingemar J Cox. 2009. Risk-aware information retrieval. In European Conference on Information Retrieval. Springer, 17--28.

Cited By

View all
  • (2024)Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User FeedbackProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671703(3645-3656)Online publication date: 25-Aug-2024
  • (2024)Unbiased Learning-to-Rank Needs Unconfounded Propensity EstimationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657772(1535-1545)Online publication date: 10-Jul-2024
  • (2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Can Clicks Be Both Labels and Features?: Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2022
    3569 pages
    ISBN:9781450387323
    DOI:10.1145/3477495
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. behavior feature
    2. exploitation bias
    3. learning to rank

    Qualifiers

    • Research-article

    Funding Sources

    • the School of Computing University of Utah
    • Amazon Search

    Conference

    SIGIR '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)396
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User FeedbackProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671703(3645-3656)Online publication date: 25-Aug-2024
    • (2024)Unbiased Learning-to-Rank Needs Unconfounded Propensity EstimationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657772(1535-1545)Online publication date: 10-Jul-2024
    • (2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
    • (2024) LT 2 R: Learning to Online Learning to Rank for Web Search 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00360(4733-4746)Online publication date: 13-May-2024
    • (2024)GPR-OPTInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10352561:1Online publication date: 1-Feb-2024
    • (2024)Investigating fairness in machine learning-based audio sentiment analysisAI and Ethics10.1007/s43681-024-00453-2Online publication date: 25-Mar-2024
    • (2024)Unbiased Recommendation Through Invariant Representation LearningMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-031-70381-2_18(280-296)Online publication date: 22-Aug-2024
    • (2023)Vertical Allocation-based Fair Exposure Amortizing in RankingProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625313(234-244)Online publication date: 26-Nov-2023
    • (2023)FARA: Future-aware Ranking Algorithm for Fairness OptimizationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614877(2906-2916)Online publication date: 21-Oct-2023
    • (2023)Measuring Service-Level Learning Effects in Search Via Query-Randomized ExperimentsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592020(2169-2173)Online publication date: 19-Jul-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media