Abstract
This paper presents some experiments of evaluation of a statistical stemming algorithm based on morphological segmentation. The method estimates affixality of word fragments. It combines three indexes associated to possible cuts. This unsupervised and language-independent method has been easily adapted to generate an effective morphological stemmer. This stemmer has been coupled with Cortex, an automatic summarization system, in order to generate summaries in English, Spanish and French. Summaries have been evaluated using ROUGE. The results of this extrinsic evaluation show that our stemming algorithm outperforms several classical systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Creutz, M., Lagus, K.: Unsupervised Discovery of Morphemes. In: Proc. of the Workshop on Morphological and Phonological Learning of ACL 2002, Philadelphia, SIGPHON-ACL, pp. 21–30 (2002)
Harris, Z.S.: From Phoneme to Morpheme. Language 31, 190–222 (1955)
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process 4 (2007)
Goldsmith, J.A.: Segmentation and Morphology. In: The Handbook of Computational Linguistics and Natural Language Processing, pp. 364–393. Wiley-Blackwell, Oxford (2010)
Medina-Urrea, A.: Investigación cuantitativa de afijos y clíticos del español de México. Glutinometría en el Corpus del Español Mexicano Contemporáneo. PhD thesis, El Colegio de México, México (2003)
Goldsmith, J.: Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics 27, 153–198 (2001)
Goldsmith, J.: An Algorithm for the Unsupervised Learning of Morphology. Natural Language Engineering 12, 353–371 (2006)
Creutz, M.: Unsupervised segmentation of words using prior distributions of morph length and frequency. In: Hinrichs, E., Roth, D. (eds.) 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 280–287 (2003)
Creutz, M., Lagus, K.: Induction of a Simple Morphology for Highly-Inflecting Languages. In: Proc. of 7th Meeting of the ACL Special Interest Group in Computational Phonology SIGPHON-ACL, pp. 43–51 (2004)
Creutz, M., Lagus, K.: Inducing the Morphological Lexicon of a Natural Language from Unannotated Text. In: Int. and Interdisciplinary Conf. on Adaptive Knowledge Representation and Reasoning (AKRR 2005), pp. 106–113 (2005)
Gelbukh, A., Alexandrov, M., Han, S.-Y.: Detecting Inflection Patterns in Natural Language by Minimization of Morphological Model. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 432–438. Springer, Heidelberg (2004)
Reyes, D.: Sistema de segmentación automática de palabras para el español. Master’s thesis, CIC-IPN (2008)
Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 23–31 (1968)
Porter, M.F.: An algorithm for Suffix Stripping. Program 14, 130–137 (1980)
Krovetz, R.: Viewing Morphology as an Inference Process. In: Proccedings of the 16th ACM/SICIR Conference, pp. 191–202 (1993)
Lennon, M., Pierce, D., Tarry, B., Willet, P.: An evaluation of some conflation algorithms for information retrieval. J. of Information Science 3, 177–183 (1981)
Majumder, P., Mitra, M., Pal, D.: Bulgarian, Hungarian and Czech Stemming Using YASS. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 49–56. Springer, Heidelberg (2008)
Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Mechanical Translation and Computational Linguistics 41, 121–137 (2005)
Paik, J.H., Mitra, M., Parui, S.K., Jarvelin, K.: GRAS: An effective and efficient stemming algorithm for information retrieval. ACM Trans. Inf. Syst. 29 (2011)
McNamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Information Retrieval 7, 73–97 (2004)
Torres-Moreno, J.M.: Reagrupamiento en familias y lexematización automática independientes del idioma. Inteligencia Artificial 47, 38–53 (2010)
Hull, D.A.: Stemming algorithms - A case study for detailed evaluation. Journal of the American Society for Information Science 47, 70–84 (1996)
Medina-Urrea, A.: Automatic Discovery of Affixes by means of Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7, 97–114 (2000)
Medina-Urrea, A., Hlaváčová, J.: Automatic Recognition of Czech Derivational Prefixes. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 189–197. Springer, Heidelberg (2005)
Medina-Urrea, A.: Affix Discovery based on Entropy and Economy Measurements. Texas Linguistics Society 10, 99–112 (2008)
Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)
de Kock, J., Bossaert, W.: Introducción a la lingüística automática en las lenguas románicas. Gredos, Madrid (1974)
Greenberg, J.H.: Essays in Linguistics. The Univ. of Chicago Press, Chicago (1957)
Spärck-Jones, K., Galliers, J.: Evaluating Natural Language Processing Systems: An Analysis and Review. Springer, New York (1996)
Medina-Urrea, A.: Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 101–104. Springer, Heidelberg (2006)
Torres-Moreno, J.M.: Résume automatique de documents, Lavoisier, Paris (2011)
Torres-Moreno, J.M., St-Onge, P.L., Gagnon, M., El-Bèze, M., Bellot, P.: Automatic Summarization System coupled with a Question-Answering System (QAAS). CoRR abs/0905.2990 (2009)
Lin, C.Y.: Rouge: A Package for Automatic Evaluation of Summaries. In: Workshop on Text Summarization Branches Out (2004)
Saggion, H., Torres-Moreno, J.M., da Cunha, I., SanJuan, E.: Multilingual summarization evaluation without human models. In: 23rd Int. Conf. on Computational Linguistics, COLING 2010, pp. 1059–1067. ACL, Beijing (2010)
Lara, L., Ham Chande, R., García Hidalgo, M.: Investigaciones lingüísticas en lexicografía. El Colegio de México, A.C., México (1979)
Torres-Moreno, J.M., Saggion, H., da Cunha, I., SanJuan, E., Velázquez-Morales, P.: Summary Evaluation with and without References. Polibits 42, 13–19 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Méndez-Cruz, CF., Torres-Moreno, JM., Medina-Urrea, A., Sierra, G. (2013). Extrinsic Evaluation on Automatic Summarization Tasks: Testing Affixality Measurements for Statistical Word Stemming. In: Batyrshin, I., Mendoza, M.G. (eds) Advances in Computational Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37798-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-37798-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37797-6
Online ISBN: 978-3-642-37798-3
eBook Packages: Computer ScienceComputer Science (R0)