research-article

Reducing long queries using query quality predictors

Authors:

Giridhar Kumaran,

Vitor R. CarvalhoAuthors Info & Claims

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pages 564 - 571

https://doi.org/10.1145/1571941.1572038

Published: 19 July 2009 Publication History

Abstract

Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30% in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and selecting the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked sub-query chosen by the ranker results in a statistically significant average improvement of 8% on our test sets. Analysis of the results shows that query reduction is well-suited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries.

References

[1]

]]M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 491--498, New York, NY, USA, 2008. ACM.

Digital Library

[2]

]]A. Broder, M. Ciaramita, M. Fontoura, E. Gabrilovich, V. Josifovski, D. Metzler, V. Murdock, and V. Plachouras. To swing or not to swing: learning when (not) to advertise. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, pages 1003--1012, New York, NY, USA, 2008. ACM.

Digital Library

[3]

]]J. Callan, W.B. Croft, and J. Broglio. Trec and tipster experiments with inquery. Information Processing and Management, 31(3):327--343, 1994.

Digital Library

[4]

]]V.R. Carvalho, J. Elsas, W.W. Cohen, and J.G. Carbonell. A meta-learning approach for robust rank learning. In SIGIR-2008 LR4IR (Workshop on Learning to Rank for Information Retrieval), Singapore, 2008.

[5]

]]K.W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22--29, 1990.

Digital Library

[6]

]]T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms, II Edition. MIT Press, 2001.

Digital Library

[7]

]]S. Cronen-Townsend, Y. Zhou, and W.B. Croft. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 299--306, New York, NY, USA, 2002. ACM.

Digital Library

[8]

]]J.R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 363--370, Morristown, NJ, USA, 2005. ACL.

Digital Library

[9]

]]C. Hauff, D. Hiemstra, and F. de Jong. A survey of pre-retrieval query performance predictors. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, pages 1419--1420, New York, NY, USA, 2008. ACM.

Digital Library

[10]

]]B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In The Eleventh Symposium on String Processing and Information Retrieval, pages 43--54. LNCS, Springer, 2004.

[11]

]]J. He, M. Larson, and M. de Rijke. Using coherence-based measures to predict query difficulty. In Proceedings of the 30th European Conference on Information Retrieval, pages 689--694. Springer, 2008.

Digital Library

[12]

]]T. Joachims. Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning, pages 169--184, Cambridge, MA, USA, 1999. MIT Press.

Digital Library

[13]

]]T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 133--142, New York, NY, USA, 2002. ACM.

Digital Library

[14]

]]R. Krovetz. Viewing morphology as an inference process. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 191--202, New York, NY, USA, 1993. ACM.

Digital Library

[15]

]]G. Kumaran and J. Allan. A case for shorter queries, and helping users create them. In The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 220--227, Rochester, New York, April 2007. ACL.

[16]

]]G. Kumaran and J. Allan. Selective user interaction. In Proceedings of the 16th ACM Conference on Information and Knowledge Management, pages 923--926, New York, NY, USA, 2007. ACM.

Digital Library

[17]

]]G. Kumaran and J. Allan. Effective and efficient user interaction for long queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 11--18, New York, NY, USA, 2008. ACM.

Digital Library

[18]

]]M. Lease, J. Allan, and B. Croft. Regression rank: Learning to meet the opportunity of descriptive queries. In Proceedings of the 31th European Conference on Information Retrieval, pages 90--101. Springer, 2009.

Digital Library

[19]

]]V. Plachouras, F. Cacheda, I. Ounis, and C.J.V. Rijsbergen. Rijsbergen. university of glasgow at the web track: Dynamic application of hyperlink analysis using the query scope. In In Proceedings of the Twelth Text REtrieval Conference (TREC 2003), page 248, 2003.

[20]

]]J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275--281, New York, NY, USA, 1998. ACM.

Digital Library

[21]

]]T. Strohman, D. Metzler, H. Turtle, and W.B. Croft. Indri: A language model-based search engine for complex queries. In International Conference on Intelligence Analysis, 2005.

[22]

]]M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges. Optimisation methods for ranking functions with multiple parameters. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pages 585--593, New York, NY, USA, 2006.

Digital Library

[23]

]]C.J. van Rijsbergen. Information Retrieval, II Edition. Butterworth-Heinemann, Newton, MA, USA, 1979.

Digital Library

[24]

]]S. Vassilvitskii and E. Brill. Using web-graph distance for relevance feedback in web search. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 147--153, New York, NY, USA, 2006. ACM.

Digital Library

[25]

]]V. Vinay, I.J. Cox, N. Milic-Frayling, and K. Wood. On ranking the effectiveness of searches. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 398--404, New York, NY, USA, 2006. ACM.

Digital Library

[26]

]]E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 512--519, New York, NY, USA, 2005. ACM.

Digital Library

[27]

]]C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2):179--214, 2004.

Digital Library

[28]

]]Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proceedings of the 30th European Conference on Information Retrieval, pages 52--64. Springer, 2008.

Digital Library

[29]

]]Y. Zhou and W.B. Croft. Ranking robustness: a novel framework to predict query performance. In Proceedings of the 15th ACM international Conference on Information and Knowledge Management, pages 567--574, New York, NY, USA, 2006. ACM.

Digital Library

Cited By

Yetukuri JWang YKhan IHao LWu ZLiu Y(2024)Multifaceted Reformulations for Null & Low queries and its parallelism with Counterfactuals2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00401(5327-5333)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00401
Chikkamath RRastogi DMaan MEndres M(2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
https://doi.org/10.1016/j.wpi.2023.102254
Pérez FLapeña RMarcén ACetina C(2023)How the Quality of Maintenance Tasks is Affected by Criteria for Selecting Engineers for CollaborationACM Transactions on Software Engineering and Methodology10.1145/356138432:3(1-22)Online publication date: 26-Apr-2023
https://dl.acm.org/doi/10.1145/3561384
Show More Cited By

Index Terms

Reducing long queries using query quality predictors
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Information Retrieval with Verbose Queries
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Recently, the focus of many novel search applications shifted from short keyword queries to verbose natural language queries. Examples include question answering systems and dialogue systems, voice search on mobile devices and entity search engines like ...
A Query Substitution-Search Result Refinement Approach for Long Query Web Searches
WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Long queries are widely used in current Web applications, such as literature searches, news searches, etc. However, since long queries are frequently expressed as natural language texts but not keywords, the current keywords-based search engines, like ...
Mining Historic Query Trails to Label Long and Rare Search Engine Queries

Web search engines can perform poorly for long queries (i.e., those containing four or more terms), in part because of their high level of query specificity. The automatic assignment of labels to long queries can capture aspects of a user’s search ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

July 2009

896 pages

ISBN:9781605584836

DOI:10.1145/1571941

General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Javed Aslam
Northeastern University, USA
,
Program Chairs:
Mark Sanderson
University of Sheffield, UK
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '09

Sponsor:

SIGIR '09: The 32nd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2009

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

124
Total Citations
View Citations
1,128
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yetukuri JWang YKhan IHao LWu ZLiu Y(2024)Multifaceted Reformulations for Null & Low queries and its parallelism with Counterfactuals2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00401(5327-5333)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00401
Chikkamath RRastogi DMaan MEndres M(2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
https://doi.org/10.1016/j.wpi.2023.102254
Pérez FLapeña RMarcén ACetina C(2023)How the Quality of Maintenance Tasks is Affected by Criteria for Selecting Engineers for CollaborationACM Transactions on Software Engineering and Methodology10.1145/356138432:3(1-22)Online publication date: 26-Apr-2023
https://dl.acm.org/doi/10.1145/3561384
Jahanbakhsh FNouri ESim RWhite RFourney A(2022)Understanding Questions that Arise When Working with Business DocumentsProceedings of the ACM on Human-Computer Interaction10.1145/35557616:CSCW2(1-24)Online publication date: 11-Nov-2022
https://dl.acm.org/doi/10.1145/3555761
Perez FZiadi TCetina C(2022)Utilizing Automatic Query Reformulations as Genetic Operations to Improve Feature Location in Software ModelsIEEE Transactions on Software Engineering10.1109/TSE.2020.300052048:2(713-731)Online publication date: 1-Feb-2022
https://doi.org/10.1109/TSE.2020.3000520
Kim BChoi HYu HKo YDemartini GZuccon GCulpepper JHuang ZTong H(2021)Query Reformulation for Descriptive Queries of Jargon Words Using a Knowledge Graph based on a DictionaryProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482382(854-862)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482382
Ma ZDou ZXu WZhang XJiang HCao ZWen JDemartini GZuccon GCulpepper JHuang ZTong H(2021)Pre-training for Ad-hoc RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482286(1212-1221)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482286
Johannessen EKarlsen RChbeir RManolopoulos YAkerkar RMizera-Pietraszko J(2020)Incremental Information RetrievalProceedings of the 10th International Conference on Web Intelligence, Mining and Semantics10.1145/3405962.3405969(169-177)Online publication date: 30-Jun-2020
https://dl.acm.org/doi/10.1145/3405962.3405969
Perez FMarcen ALapena RCetina C(2020)Evaluating Low-Cost in Internal Crowdsourcing for Software Engineering: The Case of Feature Location in an Industrial EnvironmentIEEE Access10.1109/ACCESS.2020.29859158(65745-65757)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2985915
Scells HZuccon GKoopman B(2019)Automatic Boolean Query Refinement for Systematic Review Literature SearchThe World Wide Web Conference10.1145/3308558.3313544(1646-1656)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313544
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents