Abstract
A common limit of most existing methods that manage XML structure information is that they do not handle the semantic meanings that might be associated to the markup tags. In this paper, we study how to map structure information available from XML elements into semantically related concepts in order to support the generation of XML semantic features of XML structural type. For this purpose, we define an unsupervised word sense disambiguation method to select the most appropriate meaning for each element contextually to its respective XML path. The proposed approach exploits conceptual relations provided by a lexical ontology such as WordNet and employs different notions of sense relatedness. Experiments with data from various application domains are discussed, showing that our approach can be effectively used to generate structural semantic features.
Chapter PDF
Similar content being viewed by others
Keywords
- Word Sense Disambiguation
- Sense Relatedness
- Lexical Ontology
- Word Sense Disambiguation Method
- Word Sense Disambiguation Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Banerjee, S., Pedersen, T.: Extended Gloss Overlaps as a Measure of Semantic Relatedness. In: Proc. IJCAI, pp. 805–810 (2003)
Candillier, L., Tellier, I., Torre, F.: Transforming XML Trees for Efficient Classification and Clustering. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 469–480. Springer, Heidelberg (2006)
Doucet, A., Lehtonen, M.: Unsupervised Classification of Text-Centric XML Document Collections. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 497–509. Springer, Heidelberg (2007)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Lesk, M.: Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to tell a pine cone from a ice cream cone. In: Proc. ACM SIGDOC Int. Conf. on Systems Documentation, pp. 24–26 (1986)
Mandreoli, F., Martoglia, R., Ronchetti, E.: Versatile Structural Disambiguation for Semantic-aware Applications. In: Proc. CIKM (2005)
Navigli, R., Velardi, P.: Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1075–1086 (2005)
Nayak, R.: Fast and effective clustering of XML data using structural information. Knowledge and Information Systems 14, 197–215 (2008)
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: ACM SIGMOD WebDB Workshop, pp. 61–66 (2002)
Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relatedness for Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)
Pedersen, T., Banerjee, S., Patwardhan, S.: Maximizing Semantic Relatedness to Perform Word Sense Disambiguation. Tech. rep. UMSI 2005/25, Supercomputing Institute Research at University of Minnesota, pp. 7–9 (2005)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Tagarelli, A., Greco, S.: Clustering Transactional XML Data with Semantically-Enriched Content and Structural Features. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds.) WISE 2004. LNCS, vol. 3306, pp. 266–278. Springer, Heidelberg (2004)
Tagarelli, A., Greco, S.: Toward Semantic XML Clustering. In: Proc. SIAM Data Mining, pp. 188–199 (2006)
Theobald, M., Schenkel, R., Weikum, G.: Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data. In: ACM SIGMOD WebDB Workshop, pp. 1–6 (2003)
Vercoustre, A.M., Fegas, M., Gul, S., Lechevallier, Y.: A Flexible Structured-based Representation for XML Document Mining. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 443–457. Springer, Heidelberg (2006)
Yang, D., Powers, D.M.W.: Measuring semantic similarity in the taxonomy of WordNet. In: Proc. Australasian Conf. on Computer Science, pp. 315–322 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tagarelli, A., Longo, M., Greco, S. (2009). Word Sense Disambiguation for XML Structure Feature Generation. In: Aroyo, L., et al. The Semantic Web: Research and Applications. ESWC 2009. Lecture Notes in Computer Science, vol 5554. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02121-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-02121-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02120-6
Online ISBN: 978-3-642-02121-3
eBook Packages: Computer ScienceComputer Science (R0)