Article

Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval

Authors:

Adam DarlowAuthors Info & Claims

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 512 - 519

https://doi.org/10.1145/1076034.1076121

Published: 15 August 2005 Publication History

Abstract

In this article we present novel learning methods for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. We demonstrate the usefulness of quality estimation for several applications, among them improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval. Experiments on TREC data demonstrate the robustness and the effectiveness of our learning algorithms.

References

[1]

G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In Proceedings of the 25th European Conference on Information Retrieval (ECIR 2004), pages 127--137, Sunderland, Great Britain, 2004.

[2]

L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.

Digital Library

[3]

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Chapman and Hall, 1993.

[4]

C. Buckley. Why current IR engines fail. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 584--585. ACM Press, 2004.

Digital Library

[5]

J. Callan, Z. Lu, and W. Croft. Searching distributed collections with inference networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--28, Seattle, Washington, 1995.

Digital Library

[6]

D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. Automatic query refinement using lexical affinities with maximal information gain. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 283--290. ACM Press, 2002.

Digital Library

[7]

J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, pages 37--46, 1960.

[8]

S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 299--306. ACM Press, 2002.

Digital Library

[9]

F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 18--24. ACM Press, 2004.

Digital Library

[10]

R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, Inc, New-York, USA, 2001.

Digital Library

[11]

D. Harman and C. Buckley. The NRRC reliable information access (RIA) workshop. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 528--529. ACM Press, 2004.

Digital Library

[12]

B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In A. Apostolico and M. Melucci, editors, String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, volume 3246 of Lecture Notes in Computer Science, 2004.

[13]

T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). Association of Computer Machinery, 2002.

Digital Library

[14]

K. Kwok, L. Grunfeld, H. Sun, P. Deng, and N. Dinstl. TREC 2004 Robust Track Experiments using PIRCS. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.

[15]

C. Piatko, J. Mayfield, P. McNamee, and S. Cost. JHU/APL at TREC 2004: Robust and Terabyte Tracks. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.

[16]

B. Scholkopf and A. Smola. Leaning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge, MA, USA, 2002.

Digital Library

[17]

D. Stork and E. Yom-Tov. Computer manual to accompany pattern classification. John Wiley and Sons, Inc, New-York, USA, 2004.

Digital Library

[18]

B. Swen, X.-Q. Lu, H.-Y. Zan, Q. Su, Z.-G. Lai, K. Xiang, and J.-H. Hu. Part-of-Speech Sense Matrix Model Experiments in the TREC 2004 Robust Track at ICL, PKU. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.

[19]

S. Tomlinson. Robust, Web and Terabyte Retrieval with Hummingbird SearchServer at TREC 2004. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.

[20]

G. Upton and I. Cook. Oxford dictionary of statistics. Oxford university press, Oxford, UK, 2002.

[21]

B. H. Vassilis~Plachouras and I. Ounis. University of Glasgow at TREC 2004: Experiments in Web, Robust and Terabyte tracks with Terrier. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.

[22]

E. M. Voorhees. Overview of the TREC 2004 Robust Retrieval Track. In Proceedings of the 13th Text Retrieval Conference (TREC-13). National Institute of Standards and Technology (NIST), 2004.

[23]

J. Xu and D. Lee. Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 254--261, Berkeley, California, 1999.

Digital Library

[24]

E. Yom-Tov, S. Fine, D. Carmel, A. Darlow, and E. Amitay. Juru at TREC 2004: Experiments with Prediction of Query Difficulty. In Proceedings of the 13th Text REtrieval Conference (TREC2004), 2004.

Cited By

Saleminezhad AArabzadeh NBeheshti SBagheri E(2024)Context-Aware Query Term Difficulty Estimation for Performance PredictionAdvances in Information Retrieval10.1007/978-3-031-56066-8_4(30-39)Online publication date: 15-Mar-2024
https://doi.org/10.1007/978-3-031-56066-8_4
Poesina EIonescu RMothe JChen HDuh WHuang HKato MMothe JPoblete B(2023)iQPP: A Benchmark for Image Query Performance PredictionProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591901(2953-2963)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591901
Datta SGanguly DMitra MGreene D(2022)A Relative Information Gain-based Query Performance Prediction Framework with Generated Query VariantsACM Transactions on Information Systems10.1145/354511241:2(1-31)Online publication date: 21-Dec-2022
https://dl.acm.org/doi/10.1145/3545112
Show More Cited By

Index Terms

Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval
1. Information systems
  1. Information retrieval

Recommendations

Estimating the query difficulty for information retrieval
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Many information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is ...
Query difficulty estimation via relevance prediction for image retrieval

Query difficulty estimation (QDE) attempts to automatically predict the performance of the search results returned for a given query. QDE has long been of interest in text retrieval. However, few research works have been conducted in image retrieval. ...
Query difficulty estimation for image retrieval

Query difficulty estimation predicts the performance of the search result of the given query. It is a powerful tool for multimedia retrieval and receives increasing attention. It can guide the pseudo relevance feedback to rerank the image search results ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

August 2005

708 pages

ISBN:1595930345

DOI:10.1145/1076034

General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

query difficulty estimation

Qualifiers

Article

Conference

SIGIR05

Sponsor:

SIGIR

SIGIR05: The 28th ACM/SIGIR International Symposium on Information Retrieval 2005

August 15 - 19, 2005

Salvador, Brazil

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

154
Total Citations
View Citations
2,277
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Saleminezhad AArabzadeh NBeheshti SBagheri E(2024)Context-Aware Query Term Difficulty Estimation for Performance PredictionAdvances in Information Retrieval10.1007/978-3-031-56066-8_4(30-39)Online publication date: 15-Mar-2024
https://doi.org/10.1007/978-3-031-56066-8_4
Poesina EIonescu RMothe JChen HDuh WHuang HKato MMothe JPoblete B(2023)iQPP: A Benchmark for Image Query Performance PredictionProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591901(2953-2963)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591901
Datta SGanguly DMitra MGreene D(2022)A Relative Information Gain-based Query Performance Prediction Framework with Generated Query VariantsACM Transactions on Information Systems10.1145/354511241:2(1-31)Online publication date: 21-Dec-2022
https://dl.acm.org/doi/10.1145/3545112
Hsu CChen TChen H(2022)Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web LogsJournal of Data and Information Quality10.1145/349039214:2(1-17)Online publication date: 23-Mar-2022
https://dl.acm.org/doi/10.1145/3490392
Roitman HMass YFeigenblat GShraga RBalog KSetty VLioma CLiu YZhang MBerberich K(2020)Query Performance Prediction for Multifield Document RetrievalProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409821(49-52)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.1145/3409256.3409821
Déjean SIonescu RMothe JUllah MHung CCerny TShin DBechini A(2020)Forward and backward feature selection for query performance predictionProceedings of the 35th Annual ACM Symposium on Applied Computing10.1145/3341105.3373904(690-697)Online publication date: 30-Mar-2020
https://dl.acm.org/doi/10.1145/3341105.3373904
Guo JLan Y(2020)Query ClassificationQuery Understanding for Search Engines10.1007/978-3-030-58334-7_2(15-41)Online publication date: 2-Dec-2020
https://doi.org/10.1007/978-3-030-58334-7_2
Ferro NSanderson MPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Improving the Accuracy of System Performance Estimation by Using ShardsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3338062(805-814)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3338062
Zendel OShtok ARaiber FKurland OCulpepper JPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Information Needs, Queries, and Query Performance PredictionProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331253(395-404)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331253
Roy DGanguly DMitra MJones G(2019)Estimating Gaussian mixture models in the local neighbourhood of embedded word vectors for query performance predictionInformation Processing and Management: an International Journal10.1016/j.ipm.2018.10.00956:3(1026-1045)Online publication date: 1-May-2019
https://dl.acm.org/doi/10.1016/j.ipm.2018.10.009
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten