Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1076034.1076077acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Efficient and self-tuning incremental query expansion for top-k query processing

Published: 15 August 2005 Publication History
  • Get Citation Alerts
  • Abstract

    We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion terms whose thematic similarity to the original query terms is above some specified threshold, thus generating a disjunctive query with much higher dimensionality. This poses three major problems: 1) the need for hand-tuning the expansion threshold, 2) the potential topic dilution with overly aggressive expansion, and 3) the drastically increased execution cost of a high-dimensional query. The method developed in this paper addresses all three problems by dynamically and incrementally merging the inverted lists for the potential expansion terms with the lists for the original query terms. A priority queue is used for maintaining result candidates, the pruning of candidates is based on Fagin's family of top-k algorithms, and optionally probabilistic estimators of candidate scores can be used for additional pruning. Experiments on the TREC collections for the 2004 Robust and Terabyte tracks demonstrate the increased efficiency, effectiveness, and scalability of our approach.

    References

    [1]
    V.N. Anh, O. de Kretser, A. Moffat: Vector-Space Ranking with Effective Early Termination, SIGIR 2001.
    [2]
    V.N. Anh, A. Moffat: Impact Transformation: Effective and Efficient Web Retrieval, SIGIR 2002.
    [3]
    B. Billerbeck, F. Scholer, H.E. Williams, J. Zobel: Query expansion using associated queries, CIKM 2003.
    [4]
    B. Billerbeck, J. Zobel: Questioning Query Expansion: An Examination of Behaviour and Parameters, Australian Database Conference (ADC), 2004.
    [5]
    B. Billerbeck, J. Zobel: Techniques for Efficient Query Expansion, SPIRE 2004.
    [6]
    A.Z. Broder et al.: Efficient Query Evaluation using a Two-Level Retrieval Process, CIKM 2003.
    [7]
    C. Buckley, A. F. Lewit: Optimization of Inverted Vector Searches, SIGIR 1985.
    [8]
    C. Buckley, G. Salton, J. Allan: The Effect of Adding Relevance Information in a Relevance Feedback Environment, SIGIR 1994.
    [9]
    K.C.-C. Chang, S.-W. Hwang: Minimal Probing: Supporting Expensive Predicates for Top-k Queries, SIGMOD 2002.
    [10]
    S. Chaudhuri, L. Gravano, A. Marian: Optimizing Top-K Selection Queries over Multimedia Repositories, TKDE 16(8), 2004.
    [11]
    S. Cronen-Townsend, Y. Zhou, W.B. Croft: A Framework for Selective Query Expansion, CIKM 2004.
    [12]
    A.P. de Vries et al.: Efficient k-NN Search on Vertically Decomposed Data, SIGMOD 2002.
    [13]
    R. Fagin: Combining Fuzzy Information: an Overview, ACM SIGMOD Record 31(2), 2002.
    [14]
    R. Fagin et al.: Optimal aggregation algorithms for middleware, J. Comput. Syst. Sci. 66(4), 2003.
    [15]
    C. Fellbaum (Ed.): WordNet: An Electronic Lexical Database, MIT Press, 1998.
    [16]
    U. Gü ntzer, W.-T. Balke, W. Kieβling: Optimizing Multi-Feature Queries for Image Databases, VLDB 2000.
    [17]
    U. Gü ntzer, W.-T. Balke, W. Kieβling: Towards Efficient Multi-Feature Queries in Heterogeneous Environments, ITCC 2001.
    [18]
    C.-K. Huang, L.-F. Chien, Y.-J. Oyang: Relevant term suggestion in interactive web search based on contextual information in query session logs. JASIST 54(7), 2003.
    [19]
    I. F. Ilyas, W. G. Aref, A. K. Elmagarmid: Supporting top-k join queries in relational databases, VLDB J. 13(3), 2004.
    [20]
    K.L. Kwok et al.: TREC2004 Robust Track Experiments using PIRCS, TREC 2004.
    [21]
    S. Liu, F. Liu, C. Yu, W. Meng: An Effective Approach to Document Retrieval via Utilizing WordNet and Recognizing Phrases, SIGIR 2004.
    [22]
    X. Long, T. Suel: Optimized Query Execution in Large Search Engines with Global Page Ordering, VLDB 2003.
    [23]
    A. Marian, N. Bruno, L. Gravano: Evaluating Top-k Queries over Web-Accessible Databases. ACM TODS 29(2), 2004.
    [24]
    M. Mitra, A. Singhal, C. Buckley: Improving Automatic Query Expansion, SIGIR 1998.
    [25]
    A. Moffat, J. Zobel: Self-Indexing Inverted Files for Fast Text Retrieval. ACM TOIS 14(4), 1996.
    [26]
    A. Natsev et al.: Supporting Incremental Join Queries on Ranked Inputs, VLDB 2001.
    [27]
    S. Nepal, M. V. Ramakrishna: Query Processing Issues in Image (Multimedia) Databases, ICDE 1999.
    [28]
    M. Persin, J. Zobel, R. Sacks-Davis: Filtered Document Retrieval with Frequency-Sorted Indexes. JASIS 47(10), 1996.
    [29]
    U. Pfeifer, N. Fuhr: Efficient Processing of Vague Queries using a Data Stream Approach, SIGIR 1995.
    [30]
    Y. Qiu, H.P. Frei: Concept Based Query Expansion, SIGIR 1993.
    [31]
    S.E. Robertson, S. Walker: Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval, SIGIR 1994.
    [32]
    S.E. Robertson, H. Zaragoza, M. Taylor: Simple BM25 extension to multiple weighted fields, CIKM 2004.
    [33]
    F. Scholer, H.E. Williams, A. Turpin: Query Association Surrogates for Web Search, JASIST 55(7), 2004.
    [34]
    C. Stokoe, M.P. Oakes, J. Tait: Word Sense Disambiguation in Information Retrieval Revisited, SIGIR 2003.
    [35]
    M. Theobald, G. Weikum, R. Schenkel: Top-k Query Evaluation with Probabilistic Guarantees, VLDB 2004.
    [36]
    E.M. Voorhees: Query Expansion using Lexical-Semantic Relations, SIGIR 1994.
    [37]
    J. Xu, W.B. Croft: Query Expansion Using Local and Global Document Analysis, SIGIR 1996.
    [38]
    C.T. Yu, P. Sharma, W. Meng, Y. Qin: Database selection for processing k nearest neighbors queries in distributed environments, JCDL 2001.

    Cited By

    View all
    • (2019)A Scalable Index for Top-k Subtree Similarity QueriesProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319892(1624-1641)Online publication date: 25-Jun-2019
    • (2016)Exploratory querying of extended knowledge graphsProceedings of the VLDB Endowment10.14778/3007263.30072999:13(1521-1524)Online publication date: 1-Sep-2016
    • (2016)Learning for Efficient Supervised Query Expansion via Two-stage Feature SelectionProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911539(265-274)Online publication date: 7-Jul-2016
    • Show More Cited By

    Index Terms

    1. Efficient and self-tuning incremental query expansion for top-k query processing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2005
      708 pages
      ISBN:1595930345
      DOI:10.1145/1076034
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 August 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. incremental merge
      2. probabilistic candidate pruning
      3. query expansion
      4. top-k ranking

      Qualifiers

      • Article

      Conference

      SIGIR05
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)A Scalable Index for Top-k Subtree Similarity QueriesProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319892(1624-1641)Online publication date: 25-Jun-2019
      • (2016)Exploratory querying of extended knowledge graphsProceedings of the VLDB Endowment10.14778/3007263.30072999:13(1521-1524)Online publication date: 1-Sep-2016
      • (2016)Learning for Efficient Supervised Query Expansion via Two-stage Feature SelectionProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911539(265-274)Online publication date: 7-Jul-2016
      • (2016)Relationship Queries on Extended Knowledge GraphsProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835795(605-614)Online publication date: 8-Feb-2016
      • (2016)Answering Pattern Queries Using ViewsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.242913828:2(326-341)Online publication date: 1-Feb-2016
      • (2016)Efficient Set-Correlation Operator Inside DatabasesJournal of Computer Science and Technology10.1007/s11390-016-1657-z31:4(683-701)Online publication date: 8-Jul-2016
      • (2015)Fast Forward Index Methods for Pseudo-Relevance Feedback RetrievalACM Transactions on Information Systems10.1145/274419933:4(1-33)Online publication date: 13-May-2015
      • (2014)Fuzzy Logic Programming in Action with <i>FLOPER</i>Journal of Software Engineering and Applications10.4236/jsea.2014.7402807:04(273-298)Online publication date: 2014
      • (2013)Using SKOS vocabularies for improving web searchProceedings of the 22nd International Conference on World Wide Web10.1145/2487788.2488159(1253-1258)Online publication date: 13-May-2013
      • (2013)An incremental approach to efficient pseudo-relevance feedbackProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484051(553-562)Online publication date: 28-Jul-2013
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media