Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458142acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Improved query difficulty prediction for the web

Published: 26 October 2008 Publication History

Abstract

Query performance prediction aims to predict whether a query will have a high average precision given retrieval from a particular collection, or low average precision. An accurate estimator of the quality of search engine results can allow the search engine to decide to which queries to apply query expansion, for which queries to suggest alternative search terms, to adjust the sponsored results, or to return results from specialized collections. In this paper we present an evaluation of state of the art query prediction algorithms, both post-retrieval and pre-retrieval and we analyze their sensitivity towards the retrieval algorithm. We evaluate query difficulty predictors over three widely different collections and query sets and present an analysis of why prediction algorithms perform significantly worse on Web data. Finally we introduce Improved Clarity, and demonstrate that it outperforms state-of-the-art predictors on three standard collections, including two large Web collections.

References

[1]
G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Proceedings of the 25th European Conference on Information Retrieval, pages 127--137, 2004.
[2]
J. A. Aslam and V. Pavlu. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Advances in Information Retrieval: 28th European Conference on IR Research, pages 198--209, 2007.
[3]
C. Clarke, N. Craswell, and I. Soboroff. Overview of the trec 2004 terabyte track. In Proceedings of the Thirteenth Text REtrieval Conference, 2004.
[4]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 299--306, 2002.
[5]
F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 18--24, 2004.
[6]
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In The Eleventh Symposium on String Processing and Information Retrieval (SPIRE), pages 43--54, 2004.
[7]
D. Hiemstra. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--41, 2002.
[8]
D. Hiemstra, S. Robertson, and H. Zaragoza. Parsimonious language models for information retrieval. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178--185, 2004.
[9]
R. Krovetz. Viewing morphology as an inference process. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 191--202, 1993.
[10]
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127, 2001.
[11]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR '98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, 1998.
[12]
I. Soboroff. Does WT10g look like the web? In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 423--424, 2002.
[13]
V. Vinay, I. J. Cox, N. Milic-Frayling, and K. Wood. On ranking the effectiveness of searches. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 398--404, 2006.
[14]
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 512--519, 2005.
[15]
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01: Proceedings of the tenth international conference on Information and knowledge management, pages 403--410, 2001.
[16]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342, 2001.
[17]
Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550, 2007.

Cited By

View all

Index Terms

  1. Improved query difficulty prediction for the web

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
    October 2008
    1562 pages
    ISBN:9781595939913
    DOI:10.1145/1458082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. query clarity
    2. query performance prediction

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 26 - 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Query ClassificationQuery Understanding for Search Engines10.1007/978-3-030-58334-7_2(15-41)Online publication date: 2-Dec-2020
    • (2020)Utilising Information Foraging Theory for User Interaction with Image Query Auto-CompletionAdvances in Information Retrieval10.1007/978-3-030-45439-5_44(666-680)Online publication date: 8-Apr-2020
    • (2019)Evaluation of information retrieval systems using structural equation modelingComputer Science Review10.1016/j.cosrev.2018.10.00131(1-18)Online publication date: Feb-2019
    • (2018)A characterization of sample selection bias in system evaluation and the case of information retrievalInternational Journal of Data Science and Analytics10.1007/s41060-018-0134-x6:2(131-146)Online publication date: 5-Jul-2018
    • (2018)Predicting IR personalization performance using pre-retrieval query predictorsJournal of Intelligent Information Systems10.1007/s10844-018-0498-351:3(597-620)Online publication date: 1-Dec-2018
    • (2018)Personalised Session Difficulty Prediction in an Online Academic Search EngineDigital Libraries for Open Knowledge10.1007/978-3-030-00066-0_15(174-185)Online publication date: 5-Sep-2018
    • (2017)Enhanced Mean Retrieval Score Estimation for Query Performance PredictionProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121051(35-42)Online publication date: 1-Oct-2017
    • (2017)Novel Visual and Statistical Image Features for Microblogs News VerificationIEEE Transactions on Multimedia10.1109/TMM.2016.261707819:3(598-608)Online publication date: 1-Mar-2017
    • (2017)Ambiguity in patent vocabulary: Experiments with clarity scores for claims and descriptions2017 9th International Conference on Knowledge and Smart Technology (KST)10.1109/KST.2017.7886135(365-370)Online publication date: Feb-2017
    • (2017)IRIT-QFR: IRIT Query Feature ResourceExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-65813-1_6(69-81)Online publication date: 17-Aug-2017
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media