Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1718487.1718510acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

Published: 04 February 2010 Publication History

Abstract

We propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he examined the document and not based on whether a user clicks or not a document url. This results in a model based on intrinsic relevance, as opposed to perceived relevance. We use the model to predict document relevance and then use this as feature for a "Learning to Rank" machine learning algorithm. Comparing the ranking functions obtained by training the algorithm with and without the new feature we observe surprisingly good results. This is particularly notable given that the baseline we use is the heavily optimized ranking function of a leading commercial search engine. A deeper analysis shows that the new feature is particularly helpful for non navigational queries and queries with a large abandonment rate or a large average number of queries per session. This is important because these types of query is considered to be the most difficult to solve.

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of ACM SIGIR 2006, pages 19--26, New York, NY, USA, 2006. ACM Press.
[2]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of ACM SIGIR 2006, pages 3--10, New York, NY, USA, 2006. ACM Press.
[3]
P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, pages 609--618, New York, NY, USA, 2008. ACM.
[4]
A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002.
[5]
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. Advances in Neural Information Processing Systems, 20:217--224, 2008.
[6]
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 1--10, New York, NY, USA, 2009. ACM.
[7]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In First ACM International Conference on Web Search and Data Mining WSDM 2008, 2008.
[8]
D. Downey, S.T. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In IJCAI, pages 2740--2747, 2007.
[9]
G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW2007 workshop Query Log Analysis: Social and Technological Challenges, 2007.
[10]
G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In A. Press, editor, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008.
[11]
L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in www search. In Proceedings of ACM SIGIR 2004, New York, NY, USA, 2004. ACM Press.
[12]
F. Guo, C. Liu, and Y.M. Wang. Efficient multiple-click models in web search. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 124--131, New York, NY, USA, 2009. ACM.
[13]
F. Guo, C. Liu, and Y.M. Wang. Efficient multiple-click models in web search. In R.A. Baeza-Yates, P. Boldi, B.A. Ribeiro-Neto, and B.B. Cambazoglu, editors, WSDM, pages 124--131. ACM, 2009.
[14]
A. Hassan, R. Jones, and K. Klinkner. Beyond dcg: User behavior as a predictor of a successful search. 2009.
[15]
S. Ji, K. Zhao, C. Liao, Z. Zheng, G. Xue, O. Chapelle, G. Sun, and H. Zha. Global ranking by exploiting user clicks. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--42, 2009.
[16]
T. Joachims. Evaluating search engines using clickthrough data. Department of Computer Science, Cornell University, 2002.
[17]
T. Joachims. Optimizing search engines using clickthrough data. In KDD '02: Proceedings of the eighth ACM SIGKDD, pages 133--142, New York, NY, USA, 2002. ACM Press.
[18]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of ACM SIGIR 2005, pages 154--161, New York, NY, USA, 2005. ACM Press.
[19]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2), 2007.
[20]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[21]
D. Kelly. Implicit feedback: Using behavior to infer relevance. In A. Spink and C. Cole, editors, New Directions in Cognitive Information Retrieval, pages 169--186. Springer Publishing, Netherlands, 2005.
[22]
B. Piwowarski, G. Dupret, and R. Jones. Mining user web search activity with layered bayesian networks or how to capture a click in its context. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 162--171, New York, NY, USA, 2009. ACM.
[23]
R.W. White and S.M. Drucker. Investigating behavioral variability in web search. In WWW '07, pages 21--30, New York, NY, USA, 2007. ACM.
[24]
Z. Zheng, H. Zha, K. Chen, and G. Sun. A regression framework for learning ranking functions using relative relevance judgments. In Proceedings of the 30th ACM SIGIR conference, 2007.

Cited By

View all
  • (2025)Dynamic Interaction-Driven Intent Evolver with Semantic Probability DistributionsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703508(290-299)Online publication date: 10-Mar-2025
  • (2024)When Search Engine Services meet Large Language Models: Visions and ChallengesIEEE Transactions on Services Computing10.1109/TSC.2024.3451185(1-23)Online publication date: 2024
  • (2024)Device-dependent click-through rate estimation in Google organic search results based on clicks and impressions dataAslib Journal of Information Management10.1108/AJIM-04-2023-0107Online publication date: 10-Jan-2024
  • Show More Cited By

Index Terms

  1. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '10: Proceedings of the third ACM international conference on Web search and data mining
    February 2010
    468 pages
    ISBN:9781605588896
    DOI:10.1145/1718487
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 February 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clickthrough data
    2. search engines
    3. user behavior model

    Qualifiers

    • Research-article

    Conference

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Dynamic Interaction-Driven Intent Evolver with Semantic Probability DistributionsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703508(290-299)Online publication date: 10-Mar-2025
    • (2024)When Search Engine Services meet Large Language Models: Visions and ChallengesIEEE Transactions on Services Computing10.1109/TSC.2024.3451185(1-23)Online publication date: 2024
    • (2024)Device-dependent click-through rate estimation in Google organic search results based on clicks and impressions dataAslib Journal of Information Management10.1108/AJIM-04-2023-0107Online publication date: 10-Jan-2024
    • (2024)A personalized ranking method based on inverse reinforcement learning in search enginesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108915136:PAOnline publication date: 1-Oct-2024
    • (2024)State of the ArtSupporting Web Search and Navigation by an Overlay Linking Structure10.1007/978-3-031-48393-6_2(9-35)Online publication date: 3-Jan-2024
    • (2023)PSLOG: Pretraining with Search Logs for Document RankingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599477(2072-2082)Online publication date: 6-Aug-2023
    • (2023)Facebook Content Search: Efficient and Effective Adapting Search on A Large ScaleProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591840(3290-3294)Online publication date: 19-Jul-2023
    • (2022)Proposing a New Combined Indicator for Measuring Search Engine Performance and Evaluating Google, Yahoo, DuckDuckGo, and Bing Search Engines based on Combined IndicatorJournal of Librarianship and Information Science10.1177/0961000622113857956:1(178-197)Online publication date: 8-Dec-2022
    • (2022)State-of-the-Art Survey on Web SearchThe Autonomous Web10.1007/978-3-030-90936-9_1(1-24)Online publication date: 1-Jan-2022
    • (2019)Investigating Weak Supervision in Deep RankingData and Information Management10.2478/dim-2019-00103:3(155-164)Online publication date: Sep-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media