research-article

Improved query difficulty prediction for the web

Authors:

Vanessa Murdock,

Ricardo Baeza-YatesAuthors Info & Claims

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 439 - 448

https://doi.org/10.1145/1458082.1458142

Published: 26 October 2008 Publication History

Abstract

Query performance prediction aims to predict whether a query will have a high average precision given retrieval from a particular collection, or low average precision. An accurate estimator of the quality of search engine results can allow the search engine to decide to which queries to apply query expansion, for which queries to suggest alternative search terms, to adjust the sponsored results, or to return results from specialized collections. In this paper we present an evaluation of state of the art query prediction algorithms, both post-retrieval and pre-retrieval and we analyze their sensitivity towards the retrieval algorithm. We evaluate query difficulty predictors over three widely different collections and query sets and present an analysis of why prediction algorithms perform significantly worse on Web data. Finally we introduce Improved Clarity, and demonstrate that it outperforms state-of-the-art predictors on three standard collections, including two large Web collections.

References

[1]

G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Proceedings of the 25th European Conference on Information Retrieval, pages 127--137, 2004.

[2]

J. A. Aslam and V. Pavlu. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Advances in Information Retrieval: 28th European Conference on IR Research, pages 198--209, 2007.

Digital Library

[3]

C. Clarke, N. Craswell, and I. Soboroff. Overview of the trec 2004 terabyte track. In Proceedings of the Thirteenth Text REtrieval Conference, 2004.

[4]

S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 299--306, 2002.

Digital Library

[5]

F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 18--24, 2004.

Digital Library

[6]

B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In The Eleventh Symposium on String Processing and Information Retrieval (SPIRE), pages 43--54, 2004.

[7]

D. Hiemstra. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--41, 2002.

Digital Library

[8]

D. Hiemstra, S. Robertson, and H. Zaragoza. Parsimonious language models for information retrieval. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178--185, 2004.

Digital Library

[9]

R. Krovetz. Viewing morphology as an inference process. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 191--202, 1993.

Digital Library

[10]

V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120--127, 2001.

Digital Library

[11]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR '98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281, 1998.

Digital Library

[12]

I. Soboroff. Does WT10g look like the web? In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 423--424, 2002.

Digital Library

[13]

V. Vinay, I. J. Cox, N. Milic-Frayling, and K. Wood. On ranking the effectiveness of searches. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 398--404, 2006.

Digital Library

[14]

E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 512--519, 2005.

Digital Library

[15]

C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01: Proceedings of the tenth international conference on Information and knowledge management, pages 403--410, 2001.

Digital Library

[16]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342, 2001.

Digital Library

[17]

Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550, 2007.

Digital Library

Cited By

Guo JLan Y(2020)Query ClassificationQuery Understanding for Search Engines10.1007/978-3-030-58334-7_2(15-41)Online publication date: 2-Dec-2020
https://doi.org/10.1007/978-3-030-58334-7_2
Jaiswal ALiu HFrommholz I(2020)Utilising Information Foraging Theory for User Interaction with Image Query Auto-CompletionAdvances in Information Retrieval10.1007/978-3-030-45439-5_44(666-680)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45439-5_44
Melucci MPaggiaro A(2019)Evaluation of information retrieval systems using structural equation modelingComputer Science Review10.1016/j.cosrev.2018.10.00131(1-18)Online publication date: Feb-2019
https://doi.org/10.1016/j.cosrev.2018.10.001
Show More Cited By

Index Terms

Improved query difficulty prediction for the web
1. Information systems
  1. Information retrieval

Recommendations

Query performance prediction in web search environments
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Current prediction techniques, which are generally designed for content-based queries and are typically evaluated on relatively homogenous test collections of small sizes, face serious challenges in web search environments where collections are ...
Using an Inverted Index Synopsis for Query Latency and Performance Prediction

Predicting the query latency by a search engine has important benefits, for instance, in allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic ...
Query Performance Prediction using Passage Information
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

We focus on the post-retrieval query performance prediction (QPP) task. Specifically, we make a new use of passage information for this task. Using such information we derive a new mean score calibration predictor that provides a more accurate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 26 - 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
1,012
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guo JLan Y(2020)Query ClassificationQuery Understanding for Search Engines10.1007/978-3-030-58334-7_2(15-41)Online publication date: 2-Dec-2020
https://doi.org/10.1007/978-3-030-58334-7_2
Jaiswal ALiu HFrommholz I(2020)Utilising Information Foraging Theory for User Interaction with Image Query Auto-CompletionAdvances in Information Retrieval10.1007/978-3-030-45439-5_44(666-680)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45439-5_44
Melucci MPaggiaro A(2019)Evaluation of information retrieval systems using structural equation modelingComputer Science Review10.1016/j.cosrev.2018.10.00131(1-18)Online publication date: Feb-2019
https://doi.org/10.1016/j.cosrev.2018.10.001
Melucci M(2018)A characterization of sample selection bias in system evaluation and the case of information retrievalInternational Journal of Data Science and Analytics10.1007/s41060-018-0134-x6:2(131-146)Online publication date: 5-Jul-2018
https://doi.org/10.1007/s41060-018-0134-x
Vicente-López ECampos LFernández-Luna JHuete J(2018)Predicting IR personalization performance using pre-retrieval query predictorsJournal of Intelligent Information Systems10.1007/s10844-018-0498-351:3(597-620)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1007/s10844-018-0498-3
Tran VFuhr N(2018)Personalised Session Difficulty Prediction in an Online Academic Search EngineDigital Libraries for Open Knowledge10.1007/978-3-030-00066-0_15(174-185)Online publication date: 5-Sep-2018
https://doi.org/10.1007/978-3-030-00066-0_15
Roitman HErera SSar-Shalom OWeiner BKamps JKanoulas Ede Rijke MFang HYilmaz E(2017)Enhanced Mean Retrieval Score Estimation for Query Performance PredictionProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121051(35-42)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1145/3121050.3121051
Jin ZCao JZhang YZhou JTian Q(2017)Novel Visual and Statistical Image Features for Microblogs News VerificationIEEE Transactions on Multimedia10.1109/TMM.2016.261707819:3(598-608)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1109/TMM.2016.2617078
Bertram JMandl T(2017)Ambiguity in patent vocabulary: Experiments with clarity scores for claims and descriptions2017 9th International Conference on Knowledge and Smart Technology (KST)10.1109/KST.2017.7886135(365-370)Online publication date: Feb-2017
https://doi.org/10.1109/KST.2017.7886135
Molina SMothe JRoques DTanguy LUllah M(2017)IRIT-QFR: IRIT Query Feature ResourceExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-65813-1_6(69-81)Online publication date: 17-Aug-2017
https://doi.org/10.1007/978-3-319-65813-1_6
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents