Abstract
Previous approaches on automatic extraction of lexical similarities have considered as semantic unit of text the word. However, the theoretical perspective of contextual lexical semantics suggests that larger segments of text, specifically non-compositional multiwords, are more appropriate for this role. We experimentally tested the applicability of this notion, applying automatic collocation extraction to identify and merge such multiwords prior to the similarity estimation process. Employing an automatic comparative evaluation scheme we ascertain improvement of the extracted lexico-semantic knowledge.
The presented work is supported by GEMINI (IST-2001-32343) EC project
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Miller, G.: Wordnet: An on-line lexical database. International Journal of Lexicography 3, 235–312 (1990)
Cruse, D.A.: Lexical Semantics. Cambridge University Press, Cambridge (1986)
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer, Boston (1994)
Schütze, H.: Word Sense Discrimination. Computational Linguistics 24, 97–124 (1998)
Martin, S., Liermann, J., Ney, H.: Algorithms for bigram and trigram word clustering. Speech Communication 24, 19–37 (1998)
Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of International Conference on Machine Learning (1998)
Choueka, Y.: Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: Proceedings of the RIAO Conference (1988)
Smadja, F.: Retrieving Collocations from text: Xtract. Computational Linguistics 19, 43–177 (1993)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Pearce, D.: Synonymy in collocation extraction. In: Proceedings of the NAACL 2001 Workshop on WordNet and other Lexical Resources, Pittsburgh (2001)
Sekine, S., Carroll, J., Ananiadou, S., Tsujii, J.: Automatic Learning for Semantic Collocation. In: Proceedings of the 3rd Conference on Applied NLP (1992)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 61–74 (1993)
Thanopoulos, A., Fakotakis, N., Kokkinakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of LREC, Las Palmas (2002)
Lin, D.: Automatic retrieval and clustering of similar words. COLING-ACL (1998)
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th IJCAI Conference, Montreal (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thanopoulos, A., Fakotakis, N., Kokkinakis, G. (2003). Identification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-39398-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive