Natural language technology and query expansion: issues, state-of-the-art and perspectives

Selvaretnam, Bhawani; Belkhatir, Mohammed

doi:10.1007/s10844-011-0174-3

Natural language technology and query expansion: issues, state-of-the-art and perspectives

Published: 27 August 2011

Volume 38, pages 709–740, (2012)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Bhawani Selvaretnam¹ &
Mohammed Belkhatir²

637 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

The availability of an abundance of knowledge sources has spurred a large amount of effort in the development and enhancement of Information Retrieval techniques. Users’ information needs are expressed in natural language and successful retrieval is very much dependent on the effective communication of the intended purpose. Natural language queries consist of multiple linguistic features which serve to represent the intended search goal. Linguistic characteristics that cause semantic ambiguity and misinterpretation of queries as well as additional factors such as the lack of familiarity with the search environment affect the users’ ability to accurately represent their information needs, coined by the concept “intention gap”. The latter directly affects the relevance of the returned search results which may not be to the users’ satisfaction and therefore is a major issue impacting the effectiveness of information retrieval systems. Central to our discussion is the identification of the significant constituents that characterize the query intent and their enrichment through the addition of meaningful terms, phrases or even latent representations, either manually or automatically to capture their intended meaning. Specifically, we discuss techniques to achieve the enrichment and in particular those utilizing the information gathered from statistical processing of term dependencies within a document corpus or from external knowledge sources such as ontologies. We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition, covering topical issues from query processing, information retrieval, computational linguistics and ontology engineering. For each of the modules we review state-of-the-art solutions in the literature categorized and analyzed under the light of the techniques used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coupled intrinsic and extrinsic human language resource-based query expansion

Article 10 September 2018

Semantic approaches for query expansion

Article 20 March 2021

A Tutorial on Information Retrieval Using Query Expansion

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abdelali, A., Cowie, J., & Soliman, H. S. (2007). Improving query precision using semantic expansion. Information Processing and Management, 43(3), 705–716.
Article Google Scholar
Arampatzis, A., & Kamps, J. (2008). A study of query length. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 811–812).
Aula, A. (2003). Query formulation in web information search. In Proceedings of IADIS International Conference WWW/Internet (pp. 403–410).
Avi, A., & Jaap, K. (2008). A study of query length. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information Retrieval (pp. 811–812).
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Boston: Addison-Wesley
Google Scholar
Bai, J., Song, D., Bruza, P., Nie, J.-Y., & Cao, G. (2005). Query expansion using term relationships in language models for information retrieval. In Proceedings of the 14th ACM international conference on information and knowledge management (pp. 688–695).
Bendersky, M., Croft, W. B., & Smith, D. A. (2009). Two-stage query segmentation for information retrieval. ACM SIGIR (pp. 810–811).
Bhogal, J., Macfarlane, A., & Smith, P. (2007). A review of ontology based query expansion. Information Processing and Management, 43(4), 866–886.
Article Google Scholar
Bozzon, A., Chirita, P.-A., Firan, C. S., & Nejdl, W. (2007). Lexical analysis for modeling web query reformulation. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 739–740).
Broder, A. (2002). A taxonomy of web search. SIGIR Forum 36(2), 3–10.
Article Google Scholar
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist, 32(1), 13–47.
Article Google Scholar
Cao, G., Nie, J.-Y., & Bai, J. (2005). Integrating word relationships into language models. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 298–305).
Chirita, P. A., Firan, S. F., & Nejdl, W. (2007). Personalized query expansion for the web. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 7–14)
Collins-Thompson, K., & Callan, J. (2005). Query expansion using random walk models. In Proceedings of the 14th ACM Intl conference on information and knowledge management (pp. 704–711).
Cui, H., Wen, J.-R., Nie, J.-Y., & Ma, W.-Y. (2003). Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 15, 829–839.
Article Google Scholar
Dong, H., Hussain, F. K., & Chang, E. (2008). A survey in traditional information retrieval models. In 2nd IEEE international conference on digital ecosystems and technologies (pp. 397–402).
Efthimiadis, E. N. (1996). Query expansion. Annual Review of Information Systems and Technology ARIST, 31, 121–187.
Google Scholar
Frakes, W. B., & Baeza-Yates, R. (1992). Information retrieval: Data Structures and Algorithms. Upper Saddle River: Prentice Hall.
Google Scholar
Gao, J., Nie, J.-Y., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 170–177).
Goldsmith, J. (2001). Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27(2), 153–198.
Article MathSciNet Google Scholar
Hollink, L., Schreiber, A. T., Wielinga, B. J., & Worring, M. (2004). Classification of user image descriptions. International Journal of Human-Computer Studies, 61(5), 601–626.
Article Google Scholar
Huston, S., & Croft, W. B. (2010). Evaluating verbose query processing techniques. ACM SIGIR, 291 298.
Google Scholar
Jansen, B. J., Booth, D. L., & Spink, A. (2008). Determining the informational, navigational, and transactional intent of Web queries. Information Processing and Management, 44, 1251–1266.
Article Google Scholar
Jones, K. S. (2004). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 60(5), 493–502.
Article Google Scholar
Kim, S.-B., Seo, H.-C., & Rim, H.-C. (2004). Information retrieval using word senses: root sense tagging approach. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 258–265).
Lang, H., Wang, B., Jones, G., Li, J., & Xu, Y. (2008). An evaluation and analysis of incorporating term dependency for Ad-Hoc retrieval 30th European conference on IR research (pp. 602–606).
Lau, E. P., & Goh, D. H.-L. (2006). In search of query patterns: A case study of a university OPAC. Information Processing and Management, 42(5), 1316–1329.
Article Google Scholar
Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics (pp. 317–324).
Lioma, C., and Ounis, I. (2008). A syntactically-based query reformulation technique for information retrieval. Information Processing and Management, 44(1), 143–162.
Article Google Scholar
Liu, S., Yu, C., & Meng, W. (2005). Word sense disambiguation in queries. In Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 525–532).
Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 266–272).
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
Mastora, A., Monopoli, M., & Kapidakis, S. (2008). Exploring query formulation and reformulation: A preliminary study to map users’ search behaviour. In Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries (pp. 427–430).
McCarthy, D., & Carroll, J. (2003). Disambiguating nouns, verbs, and adjectives using automatically acquired selectional preferences. Computational Linguistics, 29(4), 639–654.
Article Google Scholar
McCarthy, D., Koeling, R., Weeds, J., & Carroll, J. (2007). Unsupervised acquisition of predominant word senses. Computational Linguistics, 33(4), 553–590.
Article Google Scholar
Mena, E., Kashyap, V., Illarramendi, A., & Sheth, A. P. (2000). Imprecise answers in distributed environments: Estimation of information loss for multi-ontology based query processing. International Journal of Cooperative Information Systems, 9(4), 403–425.
Article Google Scholar
Metzler, D., & Croft, W. B. (2005). A markov random field model for term dependencies. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479).
Metzler, D., & Croft, W. B. (2007). Latent concept expansion using markov random fields. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 311–318).
Mihalcea, R. (2007). Using Wikipedia for Automatic Word Sense Disambiguation (NAACL 2007) (pp. 196–203).
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–244.
Article Google Scholar
Moreau, F., Claveau, V., & Sébillot, P. (2007). Automatic Morphological Query Expansion Using Analogy-Based Machine Learning. In Paper presented at the 29th european conference on IR research (pp. 222–233). Rome, Italy.
Morenberg, M. (2001). Doing Grammar (3rd ed.). Oxford: Oxford University Press.
Google Scholar
Nanas, N., Uren, V., & de Roeck, A. (2004). A comparative evaluation of term weighting methods for information filtering. In Proceedings of the 15th international workshop on database and expert systems applications (pp. 13–17).
Patwardhan, S., Banerjee,S., & Pedersen, T. (2007) UMND1: Unsupervised word sense disambiguation using contextual semantic relatedness. SemEval-2007, (pp. 390–393).
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet: Similarity—measuring the relatedness of concepts. In Proceedings of the 19th national conference on artificial intelligence (AAAI2004-(intelligent systems demonstration)) (pp. 1024–1025).
Peng, F., Ahmed, N., Li, X., & Lu, Y. (2007). Context sensitive stemming for web search. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 639–646).
Phan, N., Bailey, P., & Wilkinson, R. (2007). Understanding the relationship of information need specificity to search query length. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 709–710).
Punyakanok, V., Roth, D., & Yih, W.-t. (2008). The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2), 257–287.
Article Google Scholar
Radhouani, S., Lim, J. H., Chevallet, J.-P., & Falquet, G. (2006). Combining textual and visual ontologies to solve medical multimodal queries. In IEEE international conference on multimedia and Expo 2006 (pp. 1853–1856).
Rieh, S. Y., & Xie, H. (2006). Analysis of multiple query reformulations on the web: The interactive information retrieval context. Information Processing and Management, 42(3), 751-768.
Article Google Scholar
Robertson, S., & Sparck-Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146.
Article Google Scholar
Robertson, S. & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.
Article Google Scholar
Rose, D. E., & Levinson, D. (2004). Understanding user goals in web search. In Proceedings of the 13th international conference on World Wide Web (pp. 13–19).
Ruthven, I., & Llamas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(1), 95–145.
Article Google Scholar
Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41, 355-364.
Article Google Scholar
Song, R., Taylor, M. J., Wen, J.-R., Hon, H.-W., & Yu, Y. (2008). Viewing term proximity from a different perspective. 30th European Conference on IR Research (pp. 346–357).
Spink, A., Greisdorf, H., & Bateman, J. (1998). From highly relevant to not relevant: Examining different regions of relevance. Information Processing Management, 34(5), 599-621.
Article Google Scholar
Stevenson, M., & Wilks, Y. (2001). The interaction of knowledge sources in word sense disambiguation. Computational Linguistics, 27(3), 321–349.
Article Google Scholar
Turney, P. D. (2006). Similarity of Semantic Relations. Computational Linguistics, 32(3), 379–416.
Article Google Scholar
Vechtomova, O. (2009). Query expansion for information retrieval. In Collection of Encyclopeadia of database systems (pp. 2254–2257).
Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval (pp. 61–69).
White, R. W., & Morris, D. (2007). Investigating the querying and browsing behavior of advanced search engine users. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 255–262).
Wolfram, D. (2008). Search characteristics in different types of Web-based IR environments: Are they the same? Information Processing and Management, 44(3), 1279–1292.
Article Google Scholar
Xu, X., Zhu, W., Zhang, X., Hu, X., & Song, I.-Y. (2006). A comparison of local analysis, global analysis and ontology-based query expansion strategies for bio-medical literature search. In IEEE international conference on Systems, man and cybernetics, 2006. SMC ’06 (Vol. 4, pp.3441–3446).
Yinghao, L., et al. (2007). Improving weak ad-hoc queries using wikipedia as external corpus. ACM SIGIR.

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Sunway, Malaysia
Bhawani Selvaretnam
Faculty of Information Technology, Lyon Institute of Technology, University of Lyon, Lyon, France
Mohammed Belkhatir

Authors

Bhawani Selvaretnam
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Belkhatir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Belkhatir.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Selvaretnam, B., Belkhatir, M. Natural language technology and query expansion: issues, state-of-the-art and perspectives. J Intell Inf Syst 38, 709–740 (2012). https://doi.org/10.1007/s10844-011-0174-3

Download citation

Received: 20 April 2011
Revised: 04 July 2011
Accepted: 27 July 2011
Published: 27 August 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10844-011-0174-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language technology and query expansion: issues, state-of-the-art and perspectives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Coupled intrinsic and extrinsic human language resource-based query expansion

Semantic approaches for query expansion

A Tutorial on Information Retrieval Using Query Expansion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Natural language technology and query expansion: issues, state-of-the-art and perspectives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Coupled intrinsic and extrinsic human language resource-based query expansion

Semantic approaches for query expansion

A Tutorial on Information Retrieval Using Query Expansion

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation