Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1571941.1572038acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Reducing long queries using query quality predictors

Published: 19 July 2009 Publication History

Abstract

Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30% in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and selecting the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked sub-query chosen by the ranker results in a statistically significant average improvement of 8% on our test sets. Analysis of the results shows that query reduction is well-suited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries.

References

[1]
]]M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 491--498, New York, NY, USA, 2008. ACM.
[2]
]]A. Broder, M. Ciaramita, M. Fontoura, E. Gabrilovich, V. Josifovski, D. Metzler, V. Murdock, and V. Plachouras. To swing or not to swing: learning when (not) to advertise. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, pages 1003--1012, New York, NY, USA, 2008. ACM.
[3]
]]J. Callan, W.B. Croft, and J. Broglio. Trec and tipster experiments with inquery. Information Processing and Management, 31(3):327--343, 1994.
[4]
]]V.R. Carvalho, J. Elsas, W.W. Cohen, and J.G. Carbonell. A meta-learning approach for robust rank learning. In SIGIR-2008 LR4IR (Workshop on Learning to Rank for Information Retrieval), Singapore, 2008.
[5]
]]K.W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22--29, 1990.
[6]
]]T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms, II Edition. MIT Press, 2001.
[7]
]]S. Cronen-Townsend, Y. Zhou, and W.B. Croft. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 299--306, New York, NY, USA, 2002. ACM.
[8]
]]J.R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 363--370, Morristown, NJ, USA, 2005. ACL.
[9]
]]C. Hauff, D. Hiemstra, and F. de Jong. A survey of pre-retrieval query performance predictors. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, pages 1419--1420, New York, NY, USA, 2008. ACM.
[10]
]]B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In The Eleventh Symposium on String Processing and Information Retrieval, pages 43--54. LNCS, Springer, 2004.
[11]
]]J. He, M. Larson, and M. de Rijke. Using coherence-based measures to predict query difficulty. In Proceedings of the 30th European Conference on Information Retrieval, pages 689--694. Springer, 2008.
[12]
]]T. Joachims. Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning, pages 169--184, Cambridge, MA, USA, 1999. MIT Press.
[13]
]]T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 133--142, New York, NY, USA, 2002. ACM.
[14]
]]R. Krovetz. Viewing morphology as an inference process. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 191--202, New York, NY, USA, 1993. ACM.
[15]
]]G. Kumaran and J. Allan. A case for shorter queries, and helping users create them. In The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 220--227, Rochester, New York, April 2007. ACL.
[16]
]]G. Kumaran and J. Allan. Selective user interaction. In Proceedings of the 16th ACM Conference on Information and Knowledge Management, pages 923--926, New York, NY, USA, 2007. ACM.
[17]
]]G. Kumaran and J. Allan. Effective and efficient user interaction for long queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 11--18, New York, NY, USA, 2008. ACM.
[18]
]]M. Lease, J. Allan, and B. Croft. Regression rank: Learning to meet the opportunity of descriptive queries. In Proceedings of the 31th European Conference on Information Retrieval, pages 90--101. Springer, 2009.
[19]
]]V. Plachouras, F. Cacheda, I. Ounis, and C.J.V. Rijsbergen. Rijsbergen. university of glasgow at the web track: Dynamic application of hyperlink analysis using the query scope. In In Proceedings of the Twelth Text REtrieval Conference (TREC 2003), page 248, 2003.
[20]
]]J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275--281, New York, NY, USA, 1998. ACM.
[21]
]]T. Strohman, D. Metzler, H. Turtle, and W.B. Croft. Indri: A language model-based search engine for complex queries. In International Conference on Intelligence Analysis, 2005.
[22]
]]M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges. Optimisation methods for ranking functions with multiple parameters. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pages 585--593, New York, NY, USA, 2006.
[23]
]]C.J. van Rijsbergen. Information Retrieval, II Edition. Butterworth-Heinemann, Newton, MA, USA, 1979.
[24]
]]S. Vassilvitskii and E. Brill. Using web-graph distance for relevance feedback in web search. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 147--153, New York, NY, USA, 2006. ACM.
[25]
]]V. Vinay, I.J. Cox, N. Milic-Frayling, and K. Wood. On ranking the effectiveness of searches. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 398--404, New York, NY, USA, 2006. ACM.
[26]
]]E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 512--519, New York, NY, USA, 2005. ACM.
[27]
]]C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2):179--214, 2004.
[28]
]]Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proceedings of the 30th European Conference on Information Retrieval, pages 52--64. Springer, 2008.
[29]
]]Y. Zhou and W.B. Croft. Ranking robustness: a novel framework to predict query performance. In Proceedings of the 15th ACM international Conference on Information and Knowledge Management, pages 567--574, New York, NY, USA, 2006. ACM.

Cited By

View all
  • (2024)Multifaceted Reformulations for Null & Low queries and its parallelism with Counterfactuals2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00401(5327-5333)Online publication date: 13-May-2024
  • (2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
  • (2023)How the Quality of Maintenance Tasks is Affected by Criteria for Selecting Engineers for CollaborationACM Transactions on Software Engineering and Methodology10.1145/356138432:3(1-22)Online publication date: 26-Apr-2023
  • Show More Cited By

Index Terms

  1. Reducing long queries using query quality predictors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
    July 2009
    896 pages
    ISBN:9781605584836
    DOI:10.1145/1571941
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. long queries
    2. query quality
    3. query reduction
    4. verbose queries

    Qualifiers

    • Research-article

    Conference

    SIGIR '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multifaceted Reformulations for Null & Low queries and its parallelism with Counterfactuals2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00401(5327-5333)Online publication date: 13-May-2024
    • (2024)Is your search query well-formed? A natural query understanding for patent prior art searchWorld Patent Information10.1016/j.wpi.2023.10225476(102254)Online publication date: Mar-2024
    • (2023)How the Quality of Maintenance Tasks is Affected by Criteria for Selecting Engineers for CollaborationACM Transactions on Software Engineering and Methodology10.1145/356138432:3(1-22)Online publication date: 26-Apr-2023
    • (2022)Understanding Questions that Arise When Working with Business DocumentsProceedings of the ACM on Human-Computer Interaction10.1145/35557616:CSCW2(1-24)Online publication date: 11-Nov-2022
    • (2022)Utilizing Automatic Query Reformulations as Genetic Operations to Improve Feature Location in Software ModelsIEEE Transactions on Software Engineering10.1109/TSE.2020.300052048:2(713-731)Online publication date: 1-Feb-2022
    • (2021)Query Reformulation for Descriptive Queries of Jargon Words Using a Knowledge Graph based on a DictionaryProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482382(854-862)Online publication date: 26-Oct-2021
    • (2021)Pre-training for Ad-hoc RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482286(1212-1221)Online publication date: 26-Oct-2021
    • (2020)Incremental Information RetrievalProceedings of the 10th International Conference on Web Intelligence, Mining and Semantics10.1145/3405962.3405969(169-177)Online publication date: 30-Jun-2020
    • (2020)Evaluating Low-Cost in Internal Crowdsourcing for Software Engineering: The Case of Feature Location in an Industrial EnvironmentIEEE Access10.1109/ACCESS.2020.29859158(65745-65757)Online publication date: 2020
    • (2019)Automatic Boolean Query Refinement for Systematic Review Literature SearchThe World Wide Web Conference10.1145/3308558.3313544(1646-1656)Online publication date: 13-May-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media