Abstract
This paper proposes a method to automatically construct a common-sense attribute knowledge base in Chinese. The method first makes use of word formation information to bootstrap an initial attribute set from a machine readable dictionary and then extending it iteratively on the World Wide Web. The solving of the defining concepts of the attributes is modeled as a resolution problem of selectional preference. The acquired attribute knowledge base is compared to HowNet, a hand-coded lexical knowledge source. Some experimental results about the performance of the method are provided.
This work was supported by the NSFC under Grant No. 60496326 (Basic Theory and Core Techniques of Non Canonical Knowledge).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Woods, W.: What’s in a Link: Foundations for Semantic Networks. Bolt, Beranek and Newman (1975)
Almuhareb, A., Poesio, M.: Attribute-Based and Value-Based Clustering: An Evaluation. In: Proc. of EMNLP 2004, pp. 158–165 (2004)
Dong, Z., Dong, Q.: HowNet and the Computation of Meaning. World Scientific, Singapore (2006)
Amsler, R.: The Structure of the Merriam-Webster Pocket Dictionary (1980)
Chodorow, M., Byrd, R., Heidorn, G.: Extracting semantic hierarchies from a large on-line dictionary. In: Proceedings of the 23rd conference on Association for Computational Linguistics, pp. 299–304 (1985)
Wilks, Y., Fass, D., Guo, C., McDonald, J., Plate, T., Slator, B.: A tractable machine dictionary as a resource for computational semantics. Longman Publishing Group White Plains, NY, USA (1989)
Alshawi, H.: Analysing the dictionary definitions. Computational lexicography for natural language processing table of contents, 153–169 (1989)
Richardson, S., Dolan, W., Vanderwende, L.: MindNet: acquiring and structuring semantic information from text. In: Proceedings of the 17th international conference on Computational linguistics, pp. 1098–1102 (1998)
Ide, N., Veronis, J.: Extracting knowledge bases from machine-readable dictionaries: Have we wasted our time. Proceedings of KB&KS 93, 257–266 (1993)
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics, vol. 2, pp. 539–545 (1992)
Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 57–64 (1999)
Poesio, M., Ishikawa, T., im Walde, S., Viera, R.: Acquiring lexical knowledge for anaphora resolution. In: LREC. Proceedings of the 3rd Conference on Language Resources and Evaluation (2002)
Grefenstette, G., Nioche, J.: Estimation of English and non-English Language Use on the WWW. Arxiv preprint cs.CL/0006032 (2000)
Zhu, X., Rosenfeld, R.: Improving trigram language modeling with the World Wide Web. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, IEEE Computer Society Press, Los Alamitos (2001)
Keller, F., Lapata, M.: Using the web to obtain frequencies for unseen bigrams. Computational Linguistics 29(3), 459–484 (2003)
Brin, S.: Extracting patterns and relations from the world wide web. In: EDBT 1998. WebDB Workshop at 6th International Conference on Extending Database Technology, pp. 172–183 (1998)
Pennacchiotti, M., Pantel, P.: A Bootstrapping Algorithm for Automatically Harvesting Semantic Relations. In: ICoS 2006. Proceedings of Inference in Computational Semantics, Buxton, England (2006)
Chen, H., Tsai, S., Tsai, J.: Mining tables from large scale html texts. In: COLING. 18th International Conference on Computational Linguistics, pp. 166–172 (2000)
Yoshida, M., Torisawa, K., Tsujii, J.: A method to integrate tables of the world wide web. In: WDA 2001. Proceedings of the International Workshop on Web Document Analysis, Seattle, US (2001)
Poesio, M., Almuhareb, A.: Identifying Concept Attributes Using a Classifier. Ann Arbor 100 (2005)
Fujii, H., Croft, W.: A comparison of indexing techniques for Japanese text retrieval. In: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 237–246. ACM Press, New York (1993)
Theeramunkong, T., Sornlertlamvanich, V., Tanhermhong, T., Chinnan, W.: Character cluster based Thai information retrieval. In: Proceedings of the fifth international workshop on on Information retrieval with Asian languages, pp. 75–80 (2000)
Baldwin, T., Tanaka, H.: Balancing up Efficiency and Accuracy in Translation Retrieval. Journal of Natural Language Processing 8(2), 19–37 (2001)
Mosteller, F., Wallace, D.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading (1964)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence 11(11), 95–130 (1999)
Resnik, P.: Selectional constraints: an information-theoretic model and its computational realization. Cognition 61(1-2), 127–159 (1996)
Siegel, S., Castellan, N.: Nonparametric statistics for the behavioral sciences. McGraw-HiU Book Company, New York (1988)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, J., Gao, Y., Liu, H., Lu, R. (2007). Automatic Construction of a Lexical Attribute Knowledge Base . In: Zhang, Z., Siekmann, J. (eds) Knowledge Science, Engineering and Management. KSEM 2007. Lecture Notes in Computer Science(), vol 4798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76719-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-76719-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76718-3
Online ISBN: 978-3-540-76719-0
eBook Packages: Computer ScienceComputer Science (R0)