Abstract
Manual construction of a wordnet can be facilitated by a system that suggests semantic relations acquired from corpora. Such systems tend to produce many wrong suggestions. We propose a method of filtering a raw list of noun pairs potentially linked by hypernymy, and test it on Polish. The method aims for good recall and sufficient precision. The classifiers work with complex features that give clues on the relation between the nouns. We apply a corpus-based measure of semantic relatedness enhanced with a Rank Weight Function. The evaluation is based on the data in Polish WordNet. The results compare favourably with similar methods applied to English, despite the small size of Polish WordNet.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: [19], pp. 113–120
Hearst, M.A.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet – An Electronic Lexical Database. MIT Press, Cambridge (1998)
Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, Concepts and Relations in the Construction of Polish WordNet. In: Tanács, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proc. Global WordNet Conference, Seged, Hungary, January 22-25 2008, pp. 162–177. University of Szeged (2008)
Broda, B., Derwojedowa, M., Piasecki, M., Szpakowicz, S.: Corpus-based Semantic Relatedness for the Construction of Polish WordNet. In: Proc. 6th Language Resources and Evaluation Conference (LREC 2008) (to appear,2008)
Piasecki, M., Szpakowicz, S., Broda, B.: Extended Similarity Test for the Evaluation of Semantic Similarity Functions. In: Vetulani, Z. (ed.) Proc. 3rd Language and Technology Conference, Poznań, Poland, Pozna, October 5-7, 2007, pp. 104–108. Wydawnictwo Poznańskie Sp. z o.o. (2007)
Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, Cambridge, MA, pp. 1297–1304. MIT Press, Cambridge (2005)
Snow, R., Jurafsky, D., Ng., A.Y.: Semantic taxonomy induction from heterogenous evidence. In: [19]
Kennedy, A.: Analysis and Construction of Noun Hypernym Hierarchies to Enhance Roget’s Thesaurus. Master’s thesis, School of Information Technology and Engineering, University of Ottawa (2006)
Zhang, M., Zhang, J., Su, J.: Exploring syntactic features for relation extraction using a convolution tree kernel. In: Proc. Human Language Technology Conference of the NAACL, Main Conference, ACL, pp. 288–295 (2006)
Caraballo, S., Charniak, E.: Determining the specificity of nouns from text. In: Proc. Joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (VLC), pp. 63–70 (1999)
Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS (2004)
Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2005)
Ryu, P.M., Choi, K.S.: Taxonomy learning using term specificity and similarity. In: Proc. 2nd Workshop on Ontology Learning and Population ACL, Sydney, pp. 41–48 (2006)
Weiss, D.: Korpus Rzeczpospolitej. Corpus of text from the online edtion of Rzeczypospolita (2008), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita
Weka: Weka 3: Data Mining Software in Java (2008), http://www.cs.waikato.ac.nz/ml/weka/ .
Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. MIT Press, Cambridge (1998)
Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation: Algorithms and Applications. Springer, Heidelberg (2006)
Sojka, P., Kopeček, I., Pala, K. (eds.): Proc. Text, Speech and Dialog 2006 Conference. LNCS (LNAI). Springer, Heidelberg (2006)
ACL 2006, ed.: Proc. 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, The Association for Computer Linguistics (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Piasecki, M., Szpakowicz, S., Marcińczuk, M., Broda, B. (2008). Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)