Abstract
The recognition of morphological variation and conceptual proximity of the words is crucial for tasks where the lexical normalization is used, such as term generation and matching in an information retrieval environment. We present tools that automatically perform nominalization for lexical normalization in Portuguese. Comparing the effects of three alternative strategies (stemming, lemmatizing, and our proposal: nominalization), we demonstrate through an experimental evaluation that nominalization, as lexical normalization, contributes to the performance improvement in a probabilistic information retrieval approach for Portuguese.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arampatzis, A.T., Weide, T.P., Koster, C.H.A., Bommel, P.: Linguistically-motivated Information Retrieval. Encyclopedia of Library and Inf. Science 69, 201–222 (2000)
Bick, E. The Parsing System Palavras, Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. A. University Press (2000)
Braschler, M., Ripplinger, B.: How Effective is Stemming and Decompounding for German Text Retrieval? Information Retrieval Journal 7, 291–316 (2004)
Ferreira, A.B.H.: Dicionário Aurélio Eletrônico – Século XXI. Nova Fronteira S.A., Rio de Janeiro (1999)
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, New York (1992)
Gonzalez, M., de Lima, V.L.S., de Lima, J.V.: Binary Lexical Relations for Text Representation in Information Retrieval. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 21–31. Springer, Heidelberg (2005)
Gonzalez, M.: Termos e Relacionamentos em Evidência na Recuperação de Informação. PhD thesis, Instituto de Informática, UFRGS (2005)
Kettunen, K., Kunttu, T., Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? Journal of Documentation 65 (2005)
Korenius, T., Laurikkala, J., Järvelin, K., Juhola, M.: Stemming and Lemmatization in the Clustering of Finnish Text Documents. In: 13th Conference on Information and Knowledge Management, CIKM. Proceedings, pp. 625–634 (2004)
Krovetz, R.: Viewing morphology as an inference process. Artificial Intelligence 118, 227–294 (2000)
Lapata, M.: The Disambiguation of Nominalizations. Computational Linguistics 28(3), 357–388 (2002)
Mayfield, J., McNamee, P.: Single N-gram Stemming. In: 26th Annual International ACM SIGIR conference on research and development in IR. Proceedings, pp. 415–416 (2003)
Orengo, V.M., Huyck, C.: A Stemming Algorithm for the Portuguese Language. In: 8th Symposium on String Processing and IR, SPIRE. Proceedings, pp. 186–193 (2001)
Perini, M.A.: Para uma Nova Gramática do Português. São Paulo, Ática (2000)
Savary, A., Jacquemin, C.: Reducing Information Variation in Text. In: Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access. LNCS (LNAI), vol. 2705, pp. 145–181. Springer, Heidelberg (2003)
Sever, H., Bitirim, Y.: FindStem: Analysis and Evaluation of a Turkish Stemming Algorithm. In: 10th Symposium on String Processing and IR, SPIRE. Proceedings, pp. 238–251 (2003)
Sparck-Jones, K., Walker, S., Robertson, S.E.: A Probabilistic Model of Information Retrieval: Development and Comparative Experiments – Part 1 and 2. Information Processing and Management 36(6), 779–840 (1997)
Vilares, J., Barcala, F.M., Alonso, M.A.: Using Syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 381–390. Springer, Heidelberg (2002)
Voorhees, E.M.: Overview of TREC 2003. In: 12th Text Retrieval Conference, Gaithersburg. NIST Special Publication - SP500-255 (2003)
Ziviani, N.: Text Operations. In: Baeza-Yates, R., Ribeiro-Neto, B. (eds.) Modern Information Retrieval. ACM Press, New York (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gonzalez, M.A.I., de Lima, V.L.S., de Lima, J.V. (2006). Tools for Nominalization: An Alternative for Lexical Normalization. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_11
Download citation
DOI: https://doi.org/10.1007/11751984_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34045-4
Online ISBN: 978-3-540-34046-1
eBook Packages: Computer ScienceComputer Science (R0)