Abstract
Automatic discovery of part-whole relations is a fundamental problem in the area of information extraction. In this paper, we present an unsupervised approach to learning lexical patterns from online encyclopedia for extracting part-whole relations. The only input is a few part-whole instances. To tackle the term recognition problem, terms from the domain of the seeds are extracted taking use of the semantic information contained in the online encyclopedia. Instead of collecting sentences that contain relation instances from the seeds, we introduce a novel process to select sentences that may indicate part-whole relations. Patterns are produced from these sentences with terms replaced by Part and Whole tags. A similarity measurement based on a new edit distance is used and an algorithm is described to cluster similar patterns. We rank the pattern clusters according to their frequencies, and patterns from the top-k clusters are chosen to be applied to identify the new part-whole relations. Experimental results show that our method can extract abundant part-whole relations and achieve a preferable precision compared to the other state-of-the-art approaches.
Chapter PDF
Similar content being viewed by others
References
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, Association for Computational Linguistics (1992)
Van Hage, W.R., Kolb, H., Schreiber, G.: A method for learning part-whole relations. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 723–735. Springer, Heidelberg (2006)
Iris, M.A.: Problems of the part-whole relation. In: Relational Models of the Lexicon. Cambridge University Press (1989)
Winston, M.E., Chaffin, R., Herrmann, D.: A taxonomy of part-whole relations. Cognitive Science 11(4), 417–444 (1987)
Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics (1999)
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)
Girju, R., Badulescu, A., Moldovan, D.: Automatic discovery of part-whole relations. Computational Linguistics 32(1), 83–135 (2006)
Cao, X., et al.: Extracting Part-Whole Relations from Unstructured Chinese Corpus. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2008. IEEE (2008)
Wu, J., Luo, B., Cao, C.: Acquisition and verification of mereological knowledge from Web page texts. Journal-East China University of Science and Technology 32(11), 1310 (2006)
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatising the learning of lexical patterns: An application to the enrichment of wordnet by extracting semantic relationships from wikipedia. Data & Knowledge Engineering 61(3), 484–499 (2007)
Hearst, M.A.: Automated discovery of WordNet relations. WordNet: an electronic lexical database, pp. 131–151 (1998)
Zhang, H.P., Liu, Q.: ICTCLAS. Institute of Computing Technology. Chinese Academy of Sciences (2002), http://www.ict.ac.cn/freeware/003_ictclas.asp
Mei, J.: Chinese Synonym Thesaurus. Shanghai Lexicology Press, Shanghai (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Xia, F., Cao, C. (2014). Extracting Part-Whole Relations from Online Encyclopedia. In: Shi, Z., Wu, Z., Leake, D., Sattler, U. (eds) Intelligent Information Processing VII. IIP 2014. IFIP Advances in Information and Communication Technology, vol 432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44980-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-44980-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44979-0
Online ISBN: 978-3-662-44980-6
eBook Packages: Computer ScienceComputer Science (R0)