research-article

A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

Authors:

Georges Dupret,

Ciya LiaoAuthors Info & Claims

WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Pages 181 - 190

https://doi.org/10.1145/1718487.1718510

Published: 04 February 2010 Publication History

Abstract

We propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he examined the document and not based on whether a user clicks or not a document url. This results in a model based on intrinsic relevance, as opposed to perceived relevance. We use the model to predict document relevance and then use this as feature for a "Learning to Rank" machine learning algorithm. Comparing the ranking functions obtained by training the algorithm with and without the new feature we observe surprisingly good results. This is particularly notable given that the baseline we use is the heavily optimized ranking function of a leading commercial search engine. A deeper analysis shows that the new feature is particularly helpful for non navigational queries and queries with a large abandonment rate or a large average number of queries per session. This is important because these types of query is considered to be the most difficult to solve.

References

[1]

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of ACM SIGIR 2006, pages 19--26, New York, NY, USA, 2006. ACM Press.

Digital Library

[2]

E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of ACM SIGIR 2006, pages 3--10, New York, NY, USA, 2006. ACM Press.

Digital Library

[3]

P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, pages 609--618, New York, NY, USA, 2008. ACM.

Digital Library

[4]

A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002.

Digital Library

[5]

B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. Advances in Neural Information Processing Systems, 20:217--224, 2008.

[6]

O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 1--10, New York, NY, USA, 2009. ACM.

Digital Library

[7]

N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In First ACM International Conference on Web Search and Data Mining WSDM 2008, 2008.

Digital Library

[8]

D. Downey, S.T. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In IJCAI, pages 2740--2747, 2007.

Digital Library

[9]

G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW2007 workshop Query Log Analysis: Social and Technological Challenges, 2007.

[10]

G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In A. Press, editor, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008.

Digital Library

[11]

L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www search. In Proceedings of ACM SIGIR 2004, New York, NY, USA, 2004. ACM Press.

Digital Library

[12]

F. Guo, C. Liu, and Y.M. Wang. Efficient multiple-click models in web search. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 124--131, New York, NY, USA, 2009. ACM.

Digital Library

[13]

F. Guo, C. Liu, and Y.M. Wang. Efficient multiple-click models in web search. In R.A. Baeza-Yates, P. Boldi, B.A. Ribeiro-Neto, and B.B. Cambazoglu, editors, WSDM, pages 124--131. ACM, 2009.

Digital Library

[14]

A. Hassan, R. Jones, and K. Klinkner. Beyond dcg: User behavior as a predictor of a successful search. 2009.

[15]

S. Ji, K. Zhao, C. Liao, Z. Zheng, G. Xue, O. Chapelle, G. Sun, and H. Zha. Global ranking by exploiting user clicks. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--42, 2009.

Digital Library

[16]

T. Joachims. Evaluating search engines using clickthrough data. Department of Computer Science, Cornell University, 2002.

[17]

T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD, pages 133--142, New York, NY, USA, 2002. ACM Press.

Digital Library

[18]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of ACM SIGIR 2005, pages 154--161, New York, NY, USA, 2005. ACM Press.

Digital Library

[19]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2), 2007.

Digital Library

[20]

K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.

Digital Library

[21]

D. Kelly. Implicit feedback: Using behavior to infer relevance. In A. Spink and C. Cole, editors, New Directions in Cognitive Information Retrieval, pages 169--186. Springer Publishing, Netherlands, 2005.

[22]

B. Piwowarski, G. Dupret, and R. Jones. Mining user web search activity with layered bayesian networks or how to capture a click in its context. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 162--171, New York, NY, USA, 2009. ACM.

Digital Library

[23]

R.W. White and S.M. Drucker. Investigating behavioral variability in web search. In WWW '07, pages 21--30, New York, NY, USA, 2007. ACM.

Digital Library

[24]

Z. Zheng, H. Zha, K. Chen, and G. Sun. A regression framework for learning ranking functions using relative relevance judgments. In Proceedings of the 30th ACM SIGIR conference, 2007.

Digital Library

Cited By

Li ZZhang CSong DNejdl WAuer SKarras OCha MMoens MNajork M(2025)Dynamic Interaction-Driven Intent Evolver with Semantic Probability DistributionsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703508(290-299)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703508
Xiong HBian JLi YLi XDu MWang SYin DHelal S(2024)When Search Engine Services meet Large Language Models: Visions and ChallengesIEEE Transactions on Services Computing10.1109/TSC.2024.3451185(1-23)Online publication date: 2024
https://doi.org/10.1109/TSC.2024.3451185
Strzelecki AMiklosik A(2024)Device-dependent click-through rate estimation in Google organic search results based on clicks and impressions dataAslib Journal of Information Management10.1108/AJIM-04-2023-0107Online publication date: 10-Jan-2024
https://doi.org/10.1108/AJIM-04-2023-0107
Show More Cited By

Index Terms

A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
1. Information systems
  1. Information systems applications

Recommendations

A user browsing model to predict search engine click data from past observations.
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Search engine click logs provide an invaluable source of relevance information but this information is biased because we ignore which documents from the result list the users have actually seen before and after they clicked. Otherwise, we could estimate ...
A personalized search engine based on web-snippet hierarchical clustering
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web

In this paper we propose a hierarchical clustering engine, called snaket, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy offers a complementary view to the ...
Smoothing clickthrough data for web search ranking
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

February 2010

468 pages

ISBN:9781605588896

DOI:10.1145/1718487

General Chairs:
Brian D. Davison
Lehigh University, USA
,
Torsten Suel
Polytechnic Institute of NYU, USA
,
Program Chairs:
Nick Craswell
Microsoft, USA
,
Bing Liu
University of Illinois, Chicago, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM'10

Sponsor:

WSDM'10: Third ACM International Conference on Web Search and Data Mining

February 4 - 6, 2010

New York, New York, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

85
Total Citations
View Citations
1,041
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)7

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li ZZhang CSong DNejdl WAuer SKarras OCha MMoens MNajork M(2025)Dynamic Interaction-Driven Intent Evolver with Semantic Probability DistributionsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703508(290-299)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703508
Xiong HBian JLi YLi XDu MWang SYin DHelal S(2024)When Search Engine Services meet Large Language Models: Visions and ChallengesIEEE Transactions on Services Computing10.1109/TSC.2024.3451185(1-23)Online publication date: 2024
https://doi.org/10.1109/TSC.2024.3451185
Strzelecki AMiklosik A(2024)Device-dependent click-through rate estimation in Google organic search results based on clicks and impressions dataAslib Journal of Information Management10.1108/AJIM-04-2023-0107Online publication date: 10-Jan-2024
https://doi.org/10.1108/AJIM-04-2023-0107
Karamiyan FMahootchi MMohebi A(2024)A personalized ranking method based on inverse reinforcement learning in search enginesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108915136:PAOnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.108915
Roßrucker GRoßrucker G(2024)State of the ArtSupporting Web Search and Navigation by an Overlay Linking Structure10.1007/978-3-031-48393-6_2(9-35)Online publication date: 3-Jan-2024
https://doi.org/10.1007/978-3-031-48393-6_2
Su ZDou ZZhou YZhao ZWen JSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)PSLOG: Pretraining with Search Logs for Document RankingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599477(2072-2082)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599477
Niu XWu YLu XNagpal GPronin PHao KLiao ZLiao GChen HDuh WHuang HKato MMothe JPoblete B(2023)Facebook Content Search: Efficient and Effective Adapting Search on A Large ScaleProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591840(3290-3294)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591840
Hajian Hoseinabadi ACheshmehSohrabi M(2022)Proposing a New Combined Indicator for Measuring Search Engine Performance and Evaluating Google, Yahoo, DuckDuckGo, and Bing Search Engines based on Combined IndicatorJournal of Librarianship and Information Science10.1177/0961000622113857956:1(178-197)Online publication date: 8-Dec-2022
https://doi.org/10.1177/09610006221138579
Roßrucker GUnger HKubek M(2022)State-of-the-Art Survey on Web SearchThe Autonomous Web10.1007/978-3-030-90936-9_1(1-24)Online publication date: 1-Jan-2022
https://doi.org/10.1007/978-3-030-90936-9_1
Zheng YLiu YFan ZLuo CAi QZhang MMa S(2019)Investigating Weak Supervision in Deep RankingData and Information Management10.2478/dim-2019-00103:3(155-164)Online publication date: Sep-2019
https://doi.org/10.2478/dim-2019-0010
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten