Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3488560.3498420acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

External Evaluation of Ranking Models under Extreme Position-Bias

Published: 15 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Implicit feedback from users behavior is a natural and scalable source for training and evaluating ranking models in human-interactive systems. However, inherent biases such as the position bias are key obstacles to its effective usage. This is further accentuated in cases of extreme bias, where behavioral feedback can be collected exclusively on the top ranked result. In fact, in such cases, state-of-art debiasing methods cannot be applied. A prominent use case of extreme position bias is the voice shopping medium, where only a small amount of information can be presented to the user during a single interaction, resulting in user behavioral signals that are almost exclusively limited to the top offer. There is no way to know how the user would have reacted to a different offer than the top one he was actually exposed to. Thus, any new ranker we wish to evaluate with respect to a behavioral metric, requires online experimentation. We propose a novel approach, based on anexternal estimator model, for accurately predicting offline the performance of a new ranker. The accuracy of our solution is proven theoretically, as well as demonstrated by a line of experiments. In these experiments, we focus on the use case of purchase prediction, and show that our estimator can accurately predict offline the purchase rate of different rankers over a segment of voice shopping traffic. Our prediction is validated online, as being compared to the actual performance obtained by each ranker when being exposed to users.

    Supplementary Material

    MP4 File (WSDM22-fp282.mp4)
    We consider the setting of model training and evaluation in the case of extreme position bias, In which the behavioral feedback is limited almost exclusively to the top offer (motivated by the voice shopping medium). In this setting there is no way to know how the user would have reacted to a different offer than the top one he was actually exposed to. Thus, any new ranker we wish to evaluate with respect to a behavioral metric, requires online experimentation. In the talk, we introduce a novel approach, based on an external estimator model, for accurately predicting offline the performance of a new ranker. We demonstrate the accuracy of our solution by a line of experiments, in which, we focus on the use case of purchase prediction, and show that our estimator can accurately predict offline the purchase rate of different rankers over a segment of voice shopping traffic. Our prediction is validated online, as being compared to the actual performance obtained by each ranker when being exposed to user.

    References

    [1]
    Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018. Unbiased learning to rank with unbiased propensity estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval . 385--394.
    [2]
    Qingyao Ai, Tao Yang, Huazheng Wang, and Jiaxin Mao. 2021. Unbiased Learning to Rank: Online or Offline? ACM Transactions on Information Systems (TOIS), Vol. 39, 2 (2021), 1--29.
    [3]
    Grigor Aslanyan and Utkarsh Porwal. 2019. Position bias estimation for unbiased learning-to-rank in ecommerce search. In International Symposium on String Processing and Information Retrieval. Springer, 47--64.
    [4]
    David Carmel, Elad Haramaty, Arnon Lazerson, Liane Lewin-Eytan, and Yoelle Maarek. 2020. Why do people buy seemingly irrelevant items in voice product search? On the relation between product relevance and customer satisfaction in ecommerce. In Proceedings of the 13th International Conference on Web Search and Data Mining . 79--87.
    [5]
    Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th international conference on World wide web. 1--10.
    [6]
    Mouxiang Chen, Chenghao Liu, Jianling Sun, and Steven C.H. Hoi. 2021. Adapting Interactional Observation Embedding for Counterfactual Learning to Rank. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). 285--294.
    [7]
    Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). ACM, New York, NY, USA, 785--794. https://doi.org/10.1145/2939672.2939785
    [8]
    Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synthesis lectures on information concepts, retrieval, and services, Vol. 7, 3 (2015), 1--115.
    [9]
    Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.
    [10]
    Miroslav Dud'ik, John Langford, and Lihong Li. 2011. Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601 (2011).
    [11]
    Georges E Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 331--338.
    [12]
    Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
    [13]
    Ruocheng Guo, Xiaoting Zhao, Adam Henderson, Liangjie Hong, and Huan Liu. 2020. Debiasing grid-based product search in e-commerce. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . 2852--2860.
    [14]
    Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, Vol. 47, 260 (1952), 663--685.
    [15]
    Amir Ingber, Arnon Lazerson, Liane Lewin-Eytan, Alexander Libov, and Eliyahu Osherovich. 2018. The Challenges of Moving from Web to Voice in Product Search. In Proc. 1st International Workshop on Generalization in Information Retrieval (GLARE 2018). http://glare2018. dei. unipd. it/paper/glare2018-paper5. pdf .
    [16]
    Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To model or to intervene: A comparison of counterfactual and online learning to rank from user interactions. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval . 15--24.
    [17]
    Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining . 133--142.
    [18]
    Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), Vol. 25, 2 (2007), 7--es.
    [19]
    Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781--789.
    [20]
    Lihong Li, Rémi Munos, and Csaba Szepesvári. 2015. Toward minimax off-policy value estimation. In Artificial Intelligence and Statistics. PMLR, 608--616.
    [21]
    Jiaxin Mao, Zhumin Chu, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating the Reliability of Click Models. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 125--128.
    [22]
    Jiaxin Mao, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Constructing click models for mobile search. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval . 775--784.
    [23]
    Maeve O'Brien and Mark T Keane. 2006. Modeling result-list searching in the World Wide Web: The role of relevance topologies and trust bias. In Proceedings of the 28th annual conference of the cognitive science society, Vol. 28. Citeseer, 1881--1886.
    [24]
    Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management . 1293--1302.
    [25]
    Harrie Oosterhuis and Maarten de Rijke. 2020. Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval. 137--144.
    [26]
    Filip Radlinski and Thorsten Joachims. 2006. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the national conference on artificial intelligence, Vol. 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 1406.
    [27]
    Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, and Kuo Zhang. 2013a. Incorporating vertical results into search click models. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval . 503--512.
    [28]
    Hongning Wang, ChengXiang Zhai, Anlei Dong, and Yi Chang. 2013b. Content-aware click modeling. In Proceedings of the 22nd international conference on World Wide Web. 1365--1376.
    [29]
    Nan Wang, Zhen Qin, Xuanhui Wang, and Hongning Wang. 2021. Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining . 481--489.
    [30]
    Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval . 115--124.
    [31]
    Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining . 610--618.
    [32]
    Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudik. 2017. Optimal and adaptive off-policy evaluation in contextual bandits. In International Conference on Machine Learning. PMLR, 3589--3597.
    [33]
    Himank Yadav, Zhengxiao Du, and Thorsten Joachims. 2021. Policy-Gradient Training of Fair and Unbiased Ranking Functions. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). 1044--1053.
    [34]
    Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. 1201--1208.
    [35]
    Yisong Yue, Rajan Patel, and Hein Roehrig. 2010. Beyond Position Bias: Examining Result Attractiveness as a Source of Presentation Bias in Clickthrough Data. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). 1011--1018.

    Cited By

    View all
    • (2024)Large Language Models for Next Point-of-Interest RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657840(1463-1472)Online publication date: 10-Jul-2024
    • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
    • (2023)Extended Conversion: Capturing Successful Interactions in Voice ShoppingProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608836(826-832)Online publication date: 14-Sep-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
    February 2022
    1690 pages
    ISBN:9781450391320
    DOI:10.1145/3488560
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. off-policy evaluation
    2. position bias
    3. voice search

    Qualifiers

    • Research-article

    Conference

    WSDM '22

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)42
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Large Language Models for Next Point-of-Interest RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657840(1463-1472)Online publication date: 10-Jul-2024
    • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
    • (2023)Extended Conversion: Capturing Successful Interactions in Voice ShoppingProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608836(826-832)Online publication date: 14-Sep-2023
    • (2023)How Well do Offline Metrics Predict Online Performance of Product Ranking Models?Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591865(3415-3420)Online publication date: 19-Jul-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media