Abstract
Automatic induction of semantic verb classes is one of the most challenging tasks in computational lexical semantics with a wide variety of applications in natural language processing. The large number of Persian speakers and the lack of such semantic classes for Persian verbs have motivated us to use unsupervised algorithms for Persian verb clustering. In this paper, we have done experiments on inducing the semantic classes of Persian verbs based on Levin’s theory for verb classes. Syntactic information extracted from dependency trees is used as base features for clustering the verbs. Since there has been no manual classification of Persian verbs prior to this paper, we have prepared a manual classification of 265 verbs into 43 semantic classes. We show that spectral clustering algorithm outperforms KMeans and improves on the baseline algorithm with about 17% in Fmeasure and 0.13 in Rand index.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Shamsfard, M.: Challenges and open problems in Persian text processing. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, pp. 65–69 (2011)
Rasooli, M.S., Kashefi, O., Minaei-Bidgoli, B.: Effect of adaptive spell checking in Persian. In: 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 161–164 (2011)
Karimi-Doostan, G.: Lexical categories in Persian. Lingua 121(2), 207–220 (2011)
Karimi-Doostan, G.: Separability of light verb constructions in Persian. Studia Linguistica 65(1), 70–95 (2011)
Agirre, E., Bengoetxea, K., Gojenola, K., Nivre, J.: Improving dependency parsing with semantic classes. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL:HLT), Portland, Oregon, USA, pp. 699–703 (June 2011)
Chen, J., Palmer, M.: Improving english verb sense disambiguation performance with linguistically motivated features and clear sense distinction boundaries. Language Resources and Evaluation 43(2), 181–208 (2009)
Korhonen, A.: Semantically motivated subcategorization acquisition. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, USA, pp. 51–58 (2002)
Titov, I., Klementiev, A.: A Bayesian approach to unsupervised semantic role induction. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Avignon, France, pp. 12–22 (April 2012)
Levin, B.: English verb classes and alternations: A preliminary investigation, vol. 348. University of Chicago press (1993)
Rasooli, M.S., Kouhestani, M., Moloodi, A.: Development of a persian syntactic dependency treebank. In: The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlanta, USA (2013)
Forgy, E.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)
Alpert, C., Kahng, A., Yao, S.: Spectral partitioning with multiple eigenvectors. Discrete Applied Mathematics 90(1), 3–26 (1999)
Schulte im Walde, S.: Experiments on the automatic induction of German semantic verb classes. Computational Linguistics 32(2), 159–194 (2006)
Rasooli, M.S., Moloodi, A., Kouhestani, M., Minaei-Bidgoli, B.: A syntactic valency lexicon for Persian verbs: The first steps towards Persian dependency treebank. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, pp. 227–231 (2011)
Schulte Im Walde, S.: Clustering verbs semantically according to their alternation behaviour. In: Proceedings of the 18th Conference on Computational Linguistics (COLING), Saarbrücken, Germany, vol. 2, pp.747–753 (2000)
Resnik, P.: Selectional preference and sense disambiguation. In: Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How, Washington DC., USA, pp. 52–57 (1997)
Brew, C.,Schulte im Walde, S.: Spectral clustering for German verbs. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, USA, pp. 117–124 (2002)
Schulte im Walde, S., Brew, C.: Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 223–230 (July 2002)
Sun, L., Korhonen, A., Krymolowski, Y.: Verb class discovery from rich syntactic data. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 16–27. Springer, Heidelberg (2008)
Sun, L., Korhonen, A., Krymolowski, Y.: Automatic classification of English verbs using rich syntactic features. In: Third International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India, pp. 769–774 (2008)
Sun, L., Korhonen, A.: Improving verb clustering with automatically acquired selectional preferences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Suntec, Singapore, vol. 2, pp. 638–647 (2009)
Lapata, M., Brew, C.: Verb class disambiguation using informative priors. Computational Linguistics 30(1), 45–73 (2004)
Korhonen, A., Krymolowski, Y., Marx, Z.: Clustering polysemic subcategorization frame distributions semantically. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL), Sapporo, Japan, vol. 1, pp. 64–71 (2003)
Sun, L., Korhonen, A.: Hierarchical verb clustering using graph factorization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1023–1033. Association for Computational Linguistics (2011)
Vlachos, A., Korhonen, A., Ghahramani, Z.: Unsupervised and constrained Dirichlet process mixture models for verb clustering. In: Proceedings of the Workshop on Geometrical Models of Natural Language Semantics (GEMS), Athens, Greece, pp. 74–82 (2009)
Saeedi, P., Faili, H.: Feature engineering using shallow parsing in argument classification of Persian verbs. In: Proceedings of the 16th CSI International Symposiums on Artificial Intelligence and Signal Processing (AISP 2012), Shiraz, Iran (2012)
Bijankhan, M.: The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19(2) (2004)
Kashefi, O., Nasri, M., Kanani, K.: Automatic Spell Checking in Persian Language. Supreme Council of Information and Communication Technology (SCICT), Tehran, Iran (2010)
Rasooli, M.S., Faili, H., Minaei-Bidgoli, B.: Unsupervised identification of persian compound verbs. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011, Part I. LNCS (LNAI), vol. 7094, pp. 394–406. Springer, Heidelberg (2011)
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL), Sydney, Australia, pp. 91–98 (2005)
Lee, L.: On the effectiveness of the skew divergence for statistical language analysis. In: Artificial Intelligence and Statistics, vol. 2001, pp. 65–72 (2001)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, vol. 2, pp. 849–856 (2002)
Croce, D., Moschitti, A., Basili, R., Palmer, M.: Verb classification using distributional similarity in syntactic and semantic structures. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), Jeju Island, Korea (2012)
Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M., Assi, S.: Semi Automatic Development of FarsNet; the Persian WordNet. In: Proceedings of 5th Global WordNet Conference, Mumbai, India (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aminian, M., Rasooli, M.S., Sameti, H. (2013). Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-38634-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)