Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Modeling reformulation using query distributions

Published: 17 May 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Query reformulation modifies the original query with the aim of better matching the vocabulary of the relevant documents, and consequently improving ranking effectiveness. Previous models typically generate words and phrases related to the original query, but do not consider how these words and phrases would fit together in actual queries. In this article, a novel framework is proposed that models reformulation as a distribution of actual queries, where each query is a variation of the original query. This approach considers an actual query as the basic unit and thus captures important query-level dependencies between words and phrases. An implementation of this framework that only uses publicly available resources is proposed, which makes fair comparisons with other methods using TREC collections possible. Specifically, this implementation consists of a query generation step that analyzes the passages containing query words to generate reformulated queries and a probability estimation step that learns a distribution for reformulated queries by optimizing the retrieval performance. Experiments on TREC collections show that the proposed model can significantly outperform previous reformulation models.

    References

    [1]
    Bendersky, M. and Croft, W. B. 2008. Discovering key concepts in verbose queries. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 491--498.
    [2]
    Bendersky, M., Metzler, D., and Croft, W. B. 2010. Learning concept importance using a weighted dependence model. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'10). 31--40.
    [3]
    Bendersky, M., Smith, D. A., and Croft, W. B. 2009. Two-stage query segmentation for information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 810--811.
    [4]
    Bergsma, S. and Wang, Q. I. 2007. Learning noun phrase query segmentation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL'07). 819--826.
    [5]
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the International Conference on Machine Learning (ICML'05). 89--96.
    [6]
    Byrd, R. H., Nocedal, J., and Schnabel, R. B. 1994. Rrepresentations of quasi-newton matrices and their use in limited memory methods. Math. Program. 63, 2, 129--156.
    [7]
    Cao, G., Nie, J. Y., Gao, J., and Robertson, S. 2008. Selecting good expansion terms for pseudorelevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 243--250.
    [8]
    Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., and Li, H. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the International Conference on Machine Learning (ICML'07). ACM, 129--136.
    [9]
    Collins-Thompson, K. 2008. Robust model estimation methods for information retrieval. Ph.D. thesis, Carnegie Mellon University.
    [10]
    Collins-Thompson, K. and Callan, J. 2007. Estimation and use of uncertainty in pseudo-relevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 303--310.
    [11]
    Crouch, C. J. and Yang, B. 1992. Experiments in automatic statistical thesaurus construction. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92). 77--88.
    [12]
    Cummins, R., Lalmas, M., Oriordan, C., and Jose, J. 2011. Navigating the user query space. In Proceedings of the 18th International Conference on String Processing and Information Retrieval. Springer, 380--385.
    [13]
    Dang, V. and Croft, W. B. 2010. Query reformulation using anchor text. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'10). 41--50.
    [14]
    Freund, Y., Iyer, R. D., Schapire, R. E., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933--969.
    [15]
    Guo, J., Xu, G., Li, H., and Cheng, X. 2008. A unified and discriminative model for query refinement. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 379--386.
    [16]
    Herbrich, R., Graepel, T., and Obermayer, K. 2000. Large Margin Rank Boundaries for Ordinal Regression. MIT Press, Cambridge, MA.
    [17]
    Huang, J., Gao, J., Miao, J., Li, X., Wang, K., Behr, F., and Giles, C. L. 2010. Exploring web scale language models for search query processing. In Proceedings of the International Conference on World Wide Web (WWW'10). ACM, 451--460.
    [18]
    Ide, E. 1971. New experiments in relevance feedback. In The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ.
    [19]
    Jones, R. and Fain, D. C. 2003. Query word deletion prediction. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'03). 435--436.
    [20]
    Jones, R., Rey, B., Madani, O., and Greiner, W. 2006. Generating query substitutions. In Proceedings of the International Conference on World Wide Web (WWW'06). 387--396.
    [21]
    Lafferty, J. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the International Conference on Machine Learning (ICML'01). 111--119.
    [22]
    Lang, H., Metzler, D., Wang, B., and Li, J.-T. 2010. Improved latent concept expansion using hierarchical Markov random fields. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'10). 249--258.
    [23]
    Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01). 120--127.
    [24]
    Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'02). 375--382.
    [25]
    Lv, Y. and Zhai, C. 2010. Positional relevance model for pseudo-relevance feedback. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 579--586.
    [26]
    Lv, Y., Zhai, C., and Chen, W. 2011. A boosting approach to improving pseudo-relevance feedback. In Proceedings of the International Conference on Machine Learning (ICML'11). ACM, 165--174.
    [27]
    Metzler, D. and Croft, W. B. 2004. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage. 40, 5, 735--750.
    [28]
    Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). 472--479.
    [29]
    Metzler, D. and Croft, W. B. 2007. Latent concept expansion using Markov random fields. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 311--318.
    [30]
    Peng, F., Ahmed, N., Li, X., and Lu, Y. 2007. Context sensitive stemming for web search. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 639--646.
    [31]
    Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). 275--281.
    [32]
    Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.
    [33]
    Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y., and Li, H. 2008. Learning to rank relational objects and its application to web search. In Proceedings of the International Conference on World Wide Web (WWW'08). 407--416.
    [34]
    Qiu, Y. and Frei, H. P. 1993. Concept based query expansion. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93). 160--169.
    [35]
    Rocchio, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ.
    [36]
    Sheldon, D., Shokouhi, M., Szummer, M., and Craswell, N. 2011. Lambdamerge: merging the results of query reformulations. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM'11). 795--804.
    [37]
    Soskin, N., Kurland, O., and Domshlak, C. 2009. Navigating in the dark: modeling uncertainty in ad hoc retrieval using multiple relevance models. In Proceedings of the International Conference on the Theory of Information Retrieval (ICTIR'09). 79--91.
    [38]
    Svore, K. M., Kanani, P. H., and Khan, N. 2010. How good is a span of terms?: Exploiting proximity to improve web retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 155--161.
    [39]
    Tan, B. and Peng, F. 2008. Unsupervised query segmentation using generative language models and Wikipedia. In Proceedings of the International Conference on World Wide Web (WWW'08). 347--356.
    [40]
    Wang, L., Lin, J., and Metzler, D. 2010. Learning to efficiently rank. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 138--145.
    [41]
    Wang, X. and Zhai, C. 2008. Mining term association patterns from search logs for effective query reformulation. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'08). 479--488.
    [42]
    Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18, 1, 79--112.
    [43]
    Xu, J. and Li, H. 2007. Adarank: a boosting algorithm for information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 391--398.
    [44]
    Xu, Y., Jones, G. J., and Wang, B. 2009. Query dependent pseudo-relevance feedback based on Wikipedia. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). 59--66.
    [45]
    Xue, X. and Croft, W. B. 2010. Representing queries as distributions. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10) Workshop on Query Representation and Understanding. 9--12.
    [46]
    Xue, X. and Croft, W. B. 2011. Modeling subset distributions for verbose queries. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). 1133--1134.
    [47]
    Xue, X., Croft, W. B., and Smith, D. A. 2010. Modeling reformulation using passage analysis. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'10). 1497--1500.
    [48]
    Zhai, C. and Lafferty, J. 2001a. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'01). 403--410.
    [49]
    Zhai, C. and Lafferty, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01). 334--342.

    Cited By

    View all
    • (2023)Improving Search Clarification with Structured Information Extracted from Search ResultsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599389(3549-3558)Online publication date: 6-Aug-2023
    • (2023)Query Sub-intent Mining by Incorporating Search Results with Query Logs for Information Retrieval2023 IEEE 8th International Conference on Big Data Analytics (ICBDA)10.1109/ICBDA57405.2023.10104948(180-186)Online publication date: 3-Mar-2023
    • (2023)DeepQFM: a deep learning based query facets mining methodInformation Retrieval10.1007/s10791-023-09427-026:1-2Online publication date: 30-Oct-2023
    • Show More Cited By

    Index Terms

    1. Modeling reformulation using query distributions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 31, Issue 2
      May 2013
      180 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/2457465
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 May 2013
      Accepted: 01 November 2012
      Revised: 01 June 2012
      Received: 01 November 2011
      Published in TOIS Volume 31, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Query reformulation
      2. information retrieval
      3. passage analysis
      4. query segmentation
      5. query substitution

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Improving Search Clarification with Structured Information Extracted from Search ResultsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599389(3549-3558)Online publication date: 6-Aug-2023
      • (2023)Query Sub-intent Mining by Incorporating Search Results with Query Logs for Information Retrieval2023 IEEE 8th International Conference on Big Data Analytics (ICBDA)10.1109/ICBDA57405.2023.10104948(180-186)Online publication date: 3-Mar-2023
      • (2023)DeepQFM: a deep learning based query facets mining methodInformation Retrieval10.1007/s10791-023-09427-026:1-2Online publication date: 30-Oct-2023
      • (2022)Revisiting Open Domain Query Facet Extraction and GenerationProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545138(43-50)Online publication date: 23-Aug-2022
      • (2022)Stochastic Optimization of Text Set Generation for Learning Multiple Query Intent RepresentationsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557666(4003-4008)Online publication date: 17-Oct-2022
      • (2021)Learning Multiple Intent Representations for Search QueriesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482445(669-679)Online publication date: 26-Oct-2021
      • (2021)Strong natural language query generationInformation Retrieval Journal10.1007/s10791-021-09395-3Online publication date: 15-Jul-2021
      • (2019)Relevance FeedbackACM Transactions on Information Systems10.1145/336048737:4(1-28)Online publication date: 4-Oct-2019
      • (2019)Boosting Search Performance Using Query VariationsACM Transactions on Information Systems10.1145/334500137:4(1-25)Online publication date: 4-Oct-2019
      • (2019)Relevance Modeling with Multiple Query VariationsProceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3341981.3344224(27-34)Online publication date: 26-Sep-2019
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media