Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Natural language technology and query expansion: issues, state-of-the-art and perspectives

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The availability of an abundance of knowledge sources has spurred a large amount of effort in the development and enhancement of Information Retrieval techniques. Users’ information needs are expressed in natural language and successful retrieval is very much dependent on the effective communication of the intended purpose. Natural language queries consist of multiple linguistic features which serve to represent the intended search goal. Linguistic characteristics that cause semantic ambiguity and misinterpretation of queries as well as additional factors such as the lack of familiarity with the search environment affect the users’ ability to accurately represent their information needs, coined by the concept “intention gap”. The latter directly affects the relevance of the returned search results which may not be to the users’ satisfaction and therefore is a major issue impacting the effectiveness of information retrieval systems. Central to our discussion is the identification of the significant constituents that characterize the query intent and their enrichment through the addition of meaningful terms, phrases or even latent representations, either manually or automatically to capture their intended meaning. Specifically, we discuss techniques to achieve the enrichment and in particular those utilizing the information gathered from statistical processing of term dependencies within a document corpus or from external knowledge sources such as ontologies. We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition, covering topical issues from query processing, information retrieval, computational linguistics and ontology engineering. For each of the modules we review state-of-the-art solutions in the literature categorized and analyzed under the light of the techniques used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abdelali, A., Cowie, J., & Soliman, H. S. (2007). Improving query precision using semantic expansion. Information Processing and Management, 43(3), 705–716.

    Article  Google Scholar 

  • Arampatzis, A., & Kamps, J. (2008). A study of query length. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 811–812).

  • Aula, A. (2003). Query formulation in web information search. In Proceedings of IADIS International Conference WWW/Internet (pp. 403–410).

  • Avi, A., & Jaap, K. (2008). A study of query length. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information Retrieval (pp. 811–812).

  • Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Boston: Addison-Wesley

    Google Scholar 

  • Bai, J., Song, D., Bruza, P., Nie, J.-Y., & Cao, G. (2005). Query expansion using term relationships in language models for information retrieval. In Proceedings of the 14th ACM international conference on information and knowledge management (pp. 688–695).

  • Bendersky, M., Croft, W. B., & Smith, D. A. (2009). Two-stage query segmentation for information retrieval. ACM SIGIR (pp. 810–811).

  • Bhogal, J., Macfarlane, A., & Smith, P. (2007). A review of ontology based query expansion. Information Processing and Management, 43(4), 866–886.

    Article  Google Scholar 

  • Bozzon, A., Chirita, P.-A., Firan, C. S., & Nejdl, W. (2007). Lexical analysis for modeling web query reformulation. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 739–740).

  • Broder, A. (2002). A taxonomy of web search. SIGIR Forum 36(2), 3–10.

    Article  Google Scholar 

  • Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist, 32(1), 13–47.

    Article  Google Scholar 

  • Cao, G., Nie, J.-Y., & Bai, J. (2005). Integrating word relationships into language models. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 298–305).

  • Chirita, P. A., Firan, S. F., & Nejdl, W. (2007). Personalized query expansion for the web. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 7–14)

  • Collins-Thompson, K., & Callan, J. (2005). Query expansion using random walk models. In Proceedings of the 14th ACM Intl conference on information and knowledge management (pp. 704–711).

  • Cui, H., Wen, J.-R., Nie, J.-Y., & Ma, W.-Y. (2003). Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15, 829–839.

    Article  Google Scholar 

  • Dong, H., Hussain, F. K., & Chang, E. (2008). A survey in traditional information retrieval models. In 2nd IEEE international conference on digital ecosystems and technologies (pp. 397–402).

  • Efthimiadis, E. N. (1996). Query expansion. Annual Review of Information Systems and Technology ARIST, 31, 121–187.

    Google Scholar 

  • Frakes, W. B., & Baeza-Yates, R. (1992). Information retrieval: Data Structures and Algorithms. Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Gao, J., Nie, J.-Y., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 170–177).

  • Goldsmith, J. (2001). Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27(2), 153–198.

    Article  MathSciNet  Google Scholar 

  • Hollink, L., Schreiber, A. T., Wielinga, B. J., & Worring, M. (2004). Classification of user image descriptions. International Journal of Human-Computer Studies, 61(5), 601–626.

    Article  Google Scholar 

  • Huston, S., & Croft, W. B. (2010). Evaluating verbose query processing techniques. ACM SIGIR, 291 298.

    Google Scholar 

  • Jansen, B. J., Booth, D. L., & Spink, A. (2008). Determining the informational, navigational, and transactional intent of Web queries. Information Processing and Management, 44, 1251–1266.

    Article  Google Scholar 

  • Jones, K. S. (2004). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 60(5), 493–502.

    Article  Google Scholar 

  • Kim, S.-B., Seo, H.-C., & Rim, H.-C. (2004). Information retrieval using word senses: root sense tagging approach. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 258–265).

  • Lang, H., Wang, B., Jones, G., Li, J., & Xu, Y. (2008). An evaluation and analysis of incorporating term dependency for Ad-Hoc retrieval 30th European conference on IR research (pp. 602–606).

  • Lau, E. P., & Goh, D. H.-L. (2006). In search of query patterns: A case study of a university OPAC. Information Processing and Management, 42(5), 1316–1329.

    Article  Google Scholar 

  • Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics (pp. 317–324).

  • Lioma, C., and Ounis, I. (2008). A syntactically-based query reformulation technique for information retrieval. Information Processing and Management, 44(1), 143–162.

    Article  Google Scholar 

  • Liu, S., Yu, C., & Meng, W. (2005). Word sense disambiguation in queries. In Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 525–532).

  • Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 266–272).

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.

  • Mastora, A., Monopoli, M., & Kapidakis, S. (2008). Exploring query formulation and reformulation: A preliminary study to map users’ search behaviour. In Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries (pp. 427–430).

  • McCarthy, D., & Carroll, J. (2003). Disambiguating nouns, verbs, and adjectives using automatically acquired selectional preferences. Computational Linguistics, 29(4), 639–654.

    Article  Google Scholar 

  • McCarthy, D., Koeling, R., Weeds, J., & Carroll, J. (2007). Unsupervised acquisition of predominant word senses. Computational Linguistics, 33(4), 553–590.

    Article  Google Scholar 

  • Mena, E., Kashyap, V., Illarramendi, A., & Sheth, A. P. (2000). Imprecise answers in distributed environments: Estimation of information loss for multi-ontology based query processing. International Journal of Cooperative Information Systems, 9(4), 403–425.

    Article  Google Scholar 

  • Metzler, D., & Croft, W. B. (2005). A markov random field model for term dependencies. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479).

  • Metzler, D., & Croft, W. B. (2007). Latent concept expansion using markov random fields. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 311–318).

  • Mihalcea, R. (2007). Using Wikipedia for Automatic Word Sense Disambiguation (NAACL 2007) (pp. 196–203).

  • Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–244.

    Article  Google Scholar 

  • Moreau, F., Claveau, V., & Sébillot, P. (2007). Automatic Morphological Query Expansion Using Analogy-Based Machine Learning. In Paper presented at the 29th european conference on IR research (pp. 222–233). Rome, Italy.

  • Morenberg, M. (2001). Doing Grammar (3rd ed.). Oxford: Oxford University Press.

    Google Scholar 

  • Nanas, N., Uren, V., & de Roeck, A. (2004). A comparative evaluation of term weighting methods for information filtering. In Proceedings of the 15th international workshop on database and expert systems applications (pp. 13–17).

  • Patwardhan, S., Banerjee,S., & Pedersen, T. (2007) UMND1: Unsupervised word sense disambiguation using contextual semantic relatedness. SemEval-2007, (pp. 390–393).

  • Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet: Similarity—measuring the relatedness of concepts. In Proceedings of the 19th national conference on artificial intelligence (AAAI2004-(intelligent systems demonstration)) (pp. 1024–1025).

  • Peng, F., Ahmed, N., Li, X., & Lu, Y. (2007). Context sensitive stemming for web search. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 639–646).

  • Phan, N., Bailey, P., & Wilkinson, R. (2007). Understanding the relationship of information need specificity to search query length. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 709–710).

  • Punyakanok, V., Roth, D., & Yih, W.-t. (2008). The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2), 257–287.

    Article  Google Scholar 

  • Radhouani, S., Lim, J. H., Chevallet, J.-P., & Falquet, G. (2006). Combining textual and visual ontologies to solve medical multimodal queries. In IEEE international conference on multimedia and Expo 2006 (pp. 1853–1856).

  • Rieh, S. Y., & Xie, H. (2006). Analysis of multiple query reformulations on the web: The interactive information retrieval context. Information Processing and Management, 42(3), 751-768.

    Article  Google Scholar 

  • Robertson, S., & Sparck-Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146.

    Article  Google Scholar 

  • Robertson, S. & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.

    Article  Google Scholar 

  • Rose, D. E., & Levinson, D. (2004). Understanding user goals in web search. In Proceedings of the 13th international conference on World Wide Web (pp. 13–19).

  • Ruthven, I., & Llamas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(1), 95–145.

    Article  Google Scholar 

  • Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41, 355-364.

    Article  Google Scholar 

  • Song, R., Taylor, M. J., Wen, J.-R., Hon, H.-W., & Yu, Y. (2008). Viewing term proximity from a different perspective. 30th European Conference on IR Research (pp. 346–357).

  • Spink, A., Greisdorf, H., & Bateman, J. (1998). From highly relevant to not relevant: Examining different regions of relevance. Information Processing Management, 34(5), 599-621.

    Article  Google Scholar 

  • Stevenson, M., & Wilks, Y. (2001). The interaction of knowledge sources in word sense disambiguation. Computational Linguistics, 27(3), 321–349.

    Article  Google Scholar 

  • Turney, P. D. (2006). Similarity of Semantic Relations. Computational Linguistics, 32(3), 379–416.

    Article  Google Scholar 

  • Vechtomova, O. (2009). Query expansion for information retrieval. In Collection of Encyclopeadia of database systems (pp. 2254–2257).

  • Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 61–69).

  • White, R. W., & Morris, D. (2007). Investigating the querying and browsing behavior of advanced search engine users. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 255–262).

  • Wolfram, D. (2008). Search characteristics in different types of Web-based IR environments: Are they the same? Information Processing and Management, 44(3), 1279–1292.

    Article  Google Scholar 

  • Xu, X., Zhu, W., Zhang, X., Hu, X., & Song, I.-Y. (2006). A comparison of local analysis, global analysis and ontology-based query expansion strategies for bio-medical literature search. In IEEE international conference on Systems, man and cybernetics, 2006. SMC ’06 (Vol. 4, pp.3441–3446).

  • Yinghao, L., et al. (2007). Improving weak ad-hoc queries using wikipedia as external corpus. ACM SIGIR.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Belkhatir.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Selvaretnam, B., Belkhatir, M. Natural language technology and query expansion: issues, state-of-the-art and perspectives. J Intell Inf Syst 38, 709–740 (2012). https://doi.org/10.1007/s10844-011-0174-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-011-0174-3

Keywords