research-article

Open access

Learning to Rank with Selection Bias in Personal Search

Authors:

Michael Bendersky,

Donald Metzler,

Marc NajorkAuthors Info & Claims

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 115 - 124

https://doi.org/10.1145/2911451.2911537

Published: 07 July 2016 Publication History

Abstract

Click-through data has proven to be a critical resource for improving search ranking quality. Though a large amount of click data can be easily collected by search engines, various biases make it difficult to fully leverage this type of data. In the past, many click models have been proposed and successfully used to estimate the relevance for individual query-document pairs in the context of web search. These click models typically require a large quantity of clicks for each individual pair and this makes them difficult to apply in systems where click data is highly sparse due to personalized corpora and information needs, e.g., personal search. In this paper, we study the problem of how to leverage sparse click data in personal search and introduce a novel selection bias problem and address it in the learning-to-rank framework. This paper proposes a few bias estimation methods, including a novel query-dependent one that captures queries with similar results and can successfully deal with sparse data. We empirically demonstrate that learning-to-rank that accounts for query-dependent selection bias yields significant improvements in search effectiveness through online experiments with one of the world's largest personal search engines.

References

[1]

J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz. Document selection methodologies for efficient and effective learning-to-rank. In 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 468--475, 2009.

Digital Library

[2]

R. Bekkerman. Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora. Technical report, University of Massachusetts Amherst, 2004.

[3]

P. F. Brown, V. J. D. Pietra, R. L. Mercer, S. A. D. Pietra, and J. C. Lai. An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1):31--40, 1992.

Digital Library

[4]

C. J. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research, 2010.

[5]

S. Büttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 63--70, 2007.

Digital Library

[6]

D. Carmel, G. Halawi, L. Lewin-Eytan, Y. Maarek, and A. Raviv. Rank by time or by relevance?: Revisiting email search. In 24th ACM International Conference on Information and Knowledge Management (CIKM), pages 283--292, 2015.

Digital Library

[7]

D. Chan, R. Ge, O. Gershony, T. Hesterberg, and D. Lambert. Evaluating online ad campaigns in a pipeline: Causal models at scale. In 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 7--16, 2010.

Digital Library

[8]

O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In 18th International Conference on World Wide Web (WWW), pages 1--10, 2009.

Digital Library

[9]

W. Chen, Z. Ji, S. Shen, and Q. Yang. A whole page click model to better interpret search engine click data. In 25th AAAI Conference on Artificial Intelligence (AAAI), 2011.

Digital Library

[10]

A. Chuklin, I. Markov, and M. de Rijke. Click Models for Web Search. Morgan & Claypool, 2015.

[11]

N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In 1st International Conference on Web Search and Data Mining (WSDM), pages 87--94, 2008.

Digital Library

[12]

M. Dudík, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In 28th International Conference on Machine Learning (ICML), pages 1097--1104, 2011.

[13]

S. Dumais, E. Cutrell, J. Cadiz, G. Jancke, R. Sarin, and D. C. Robbins. Stuff I've seen: A system for personal information retrieval and re-use. In 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 72--79, 2003.

Digital Library

[14]

G. Dupret and C. Liao. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In 3rd ACM International Conference on Web Search and Data Mining (WSDM), pages 181--190, 2010.

Digital Library

[15]

G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 331--338, 2008.

Digital Library

[16]

M. Grbovic, G. Halawi, Z. Karnin, and Y. Maarek. How many folders do you really need?: Classifying email into a handful of categories. In 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), pages 869--878, 2014.

Digital Library

[17]

F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. In 18th International Conference on World Wide Web (WWW), pages 11--20, 2009.

Digital Library

[18]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.

[19]

K. Hofmann, S. Whiteson, and M. de Rijke. Balancing exploration and exploitation in learning to rank online. In Advances in Information Retrieval -- 33rd European Conference on IR Research (ECIR), pages 251--263, 2011.

Digital Library

[20]

J. Huang, A. Gretton, K. M. Borgwardt, B. Schölkopf, and A. J. Smola. Correcting sample selection bias by unlabeled data. In 20th Annual Conference on Neural Information Processing Systems (NIPS), pages 601--608, 2006.

[21]

T. Joachims. Optimizing search engines using clickthrough data. In 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 133--142, 2002.

Digital Library

[22]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 154--161, 2005.

Digital Library

[23]

M. Kamvar, M. Kellar, R. Patel, and Y. Xu. Computers and iPhones and mobile phones, oh my!: A logs-based comparison of search users on different devices. In 18th International Conference on World Wide Web (WWW), pages 801--810, 2009.

Digital Library

[24]

L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics in search engines: A case study. In 24th International Conference on World Wide Web (WWW) Companion, pages 929--934, 2015.

Digital Library

[25]

L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In 4th International Conference on Web Search and Web Data Mining (WSDM), pages 297--306, 2011.

Digital Library

[26]

T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009.

Digital Library

[27]

T. Minka and S. Robertson. Selection bias in the LETOR datasets. In SIGIR Workshop on Learning to Rank for Information Retrieval (LR4IR), pages 48--51, 2008.

[28]

M. O'Brien and M. T. Keane. Modeling result--list searching in the world wide web: The role of relevance topologies and trust bias. In 28th Annual Conference of the Cognitive Science Society (CogSci), pages 1--881, 2006.

[29]

T. Qin, T.-Y. Liu, J. Xu, and H. Li. Letor: A benchmark collection for research on learning to rank for information retrieval. Inf. Retr., 13(4):346--374, 2010.

Digital Library

[30]

M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: Estimating the click-through rate for new ads. In 16th International Conference on World Wide Web (WWW), pages 521--530, 2007.

Digital Library

[31]

P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41--55, 1983.

[32]

A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In 32nd International Conference on Machine Learning (ICML), pages 814--823, 2015.

[33]

Y. Yue, R. Patel, and H. Roehrig. Beyond position bias: Examining result attractiveness as a source of presentation bias in clickthrough data. In 19th International Conference on World Wide Web (WWW), pages 1011--1018, 2010.

Digital Library

[34]

B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In 21st International Conference on Machine Learning (ICML), page 114, 2004.

Digital Library

[35]

Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 287--294, 2007.

Digital Library

[36]

Z. A. Zhu, W. Chen, T. Minka, C. Zhu, and Z. Chen. A novel click model and its applications to online advertising. In 3rd International Conference on Web Search and Data Mining (WSDM), pages 321--330, 2010.

Digital Library

Cited By

Qiu YDong HChen JHe X(2024)LightAD: accelerating AutoDebias with adaptive samplingJUSTC10.52396/JUSTC-2022-010054:4(0405)Online publication date: 2024
https://doi.org/10.52396/JUSTC-2022-0100
He LZhao JGu YElbaz MDing Z(2024)A bias study and an unbiased deep neural network for recommender systemsWeb Intelligence10.3233/WEB-23003622:1(15-29)Online publication date: 26-Mar-2024
https://doi.org/10.3233/WEB-230036
Wu XPuthenputhussery AShang HKang CFang Y(2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
https://doi.org/10.1145/3698876
Show More Cited By

Index Terms

Learning to Rank with Selection Bias in Personal Search
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm
WWW '19: The World Wide Web Conference

Recently a number of algorithms under the theme of 'unbiased learning-to-rank' have been proposed, which can reduce position bias, the major type of bias in click data, and train a high-performance ranker with click data. Most of the existing algorithms,...
Correcting for Selection Bias in Learning-to-rank Systems
WWW '20: Proceedings of The Web Conference 2020

Click data collected by modern recommendation systems are an important source of observational data that can be utilized to train learning-to-rank (LTR) systems. However, these data suffer from a number of biases that can result in poor performance for ...
Situational Context for Ranking in Personal Search
WWW '17: Proceedings of the 26th International Conference on World Wide Web

Modern search engines leverage a variety of sources, beyond the conventional query-document content similarity, to improve their ranking performance. Among them, query context has attracted attention in prior work. Previously, query context was mainly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

July 2016

1296 pages

ISBN:9781450340694

DOI:10.1145/2911451

General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2016 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '16

Sponsor:

SIGIR

SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval

July 17 - 21, 2016

Pisa, Italy

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

192
Total Citations
View Citations
3,538
Total Downloads

Downloads (Last 12 months)466
Downloads (Last 6 weeks)67

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qiu YDong HChen JHe X(2024)LightAD: accelerating AutoDebias with adaptive samplingJUSTC10.52396/JUSTC-2022-010054:4(0405)Online publication date: 2024
https://doi.org/10.52396/JUSTC-2022-0100
He LZhao JGu YElbaz MDing Z(2024)A bias study and an unbiased deep neural network for recommender systemsWeb Intelligence10.3233/WEB-23003622:1(15-29)Online publication date: 26-Mar-2024
https://doi.org/10.3233/WEB-230036
Wu XPuthenputhussery AShang HKang CFang Y(2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
https://doi.org/10.1145/3698876
Shao PWu LZhang KLian DHong RLi YWang M(2024)Average User-Side Counterfactual Fairness for Collaborative FilteringACM Transactions on Information Systems10.1145/365663942:5(1-26)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3656639
Huang JZhang LWang JJiang SHuang DDing CXu L(2024)Utilizing Non-click Samples via Semi-supervised Learning for Conversion Rate PredictionProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688151(350-359)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688151
Yang YChen BZhu CZhu MDai XGuo HZhang MDong ZTang R(2024)AIE: Auction Information Enhanced Framework for CTR Prediction in Online AdvertisingProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688136(633-642)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688136
Wu JChang CYu THe ZWang JHou YMcAuley JBaeza-Yates RBonchi F(2024)CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671901(3391-3401)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671901
Zhao HCai GZhu JDong ZXu JWen JBaeza-Yates RBonchi F(2024)Counteracting Duration Bias in Video Recommendation via Counterfactual Watch TimeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671817(4455-4466)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671817
Cavenaghi EZanga AStella FZanker M(2024)Towards a Causal Decision-Making Framework for Recommender SystemsACM Transactions on Recommender Systems10.1145/36291692:2(1-34)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3629169
Hager PDeffayet RRenders JZoeter Ode Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657892(1546-1556)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657892
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents