Abstract
We propose a method of acquiring attribute words for a wide range of objects from Japanese Web documents. The method is a simple unsupervised method that utilizes the statistics of words, lexico-syntactic patterns, and HTML tags. To evaluate the attribute words, we also establish criteria and a procedure based on question-answerability about the candidate word.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yoshida, M.: Extracting attributes and their values from web pages. In: Proc. of the ACL 2002 Student Research Workshop, pp. 72–77 (2002)
Yoshida, M., Torisawa, K., Tsujii, J.: Integrating tables on the world wide web. Transactions of the Japanese Society for Artificial Intelligence 19, 548–560 (2004)
Fleischman, M., Hovy, E., Echihabi, A.: Offline strategies for online question answering: Answering questions before they are asked. In: Dignum, F.P.M. (ed.) ACL 2003, pp. 1–7 (2003)
Almuhareb, A., Poesio, M.: Attribute-Based and Value-Based Clustering: An Evaluation. In: Proc. of EMNLP 2004, pp. 158–165 (2004)
Fellbaum, C. (ed.): WordNet: An electronic lexical database. MIT Press, Cambridge (1998)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of COLING 1992, pp. 539–545 (1992)
Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proc. of ACL 1999 (1999)
Takahashi, T., Inui, K., Matsumoto, Y.: Automatic extraction of attribute relations from text (in Japanese). IPSJ, SIG-NLP. NL-164, 19–24 (2004)
Guarino, N.: Concepts, attributes and arbitrary relations: some linguistic and ontological criteria for structuring knowledge base. Data and Knowledge Engineering, 249–261 (1992)
Pustejovsky, J.: The Generative Lexicon. The MIT Press, Cambridge (1995)
Woods, W.A.: What’s in a Link: Foundations for Semantic Networks. In: Representation and Understanding: Studies in Cognitive Science. Academic Press, London (1975)
Kurohashi, S., Nagao, M.: Japanese morphological analysis system JUMAN version 3.61 manual (1999)
Kanayama, H., Torisawa, K., Mitsuishi, Y., Tsujii, J.: A hybrid Japanese parser with hand-crafted grammar and statistics. In: Proc. of COLING 2000, pp. 411–417 (2000)
Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from web documents. In: Proc. of HLT-NAACL 2004, pp. 73–80 (2004)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorial data. Biometrics 33, 159–174 (1977)
Yoshida, M., Torisawa, K., Tsujii, J.: Extracting Attributes and Their Values from Web Pages. In: Web Document Analysis. Ch. 10 World Scientific, Singapore (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tokunaga, K., Kazama, J., Torisawa, K. (2005). Automatic Discovery of Attribute Words from Web Documents. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_10
Download citation
DOI: https://doi.org/10.1007/11562214_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)