Abstract
This paper proposes a topic selection method for web documents using ontology hierarchy. The idea of this approach is to utilize the ontology structure in order to determine a topic in a web document. In this paper, we propose an approach for improving the performance of document clustering as we select the topic efficiently based on domain ontology. We preprocess the web documents for keywords extraction using Term Frequency formula and we build domain ontology as we branch off the partial hierarchy from WordNet using an automatic domain ontology building tool in preprocessing step. And we select a topic for the web documents based on domain ontology structure. Finally we realized that our approach contributes the efficient document clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chekuri, C., Goldwasser, M.H., Raghavan, P., Upfal, E.: Web Search Using Automated Classification. In: Poster at the Sixth International World Wide Web Conference (WWW6) (1997)
Gelbukh, A., Sidorov, G., Guzman, A.: Use of a Weighted Topic Hierarchy for Document Classification. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 130–135. Springer, Heidelberg (1999)
Gövert, N., Lalmas, M., Fuhr, N.: A Probabilistic Description-Oriented Approach for Categorizing Web Document. In: Proceeding of the Eighth International Conference on Information Knowledge Management, Kansas City, MO USA, pp. 475–482 (1999)
Greiner, R., Grove, A., Schuurmans, D.: On learning hierarchical Classifications (1997)
Grobelnik, M., Mladenic, D.: Fast Categorization. In: Proceedings of Third International Conference on Knowledge Discovery Data Mining (1998)
Koller, D., Sahami, M.: Hierarchically Classifying Documents Using Very Few Words. In: The Proceeding of Machine Learning (ICML 1997), pp. 170–176 (1997)
Lee, J., Shin, D.: Multilevel Automatic Categorization for Webpages. In: The INET Proceeding 1998 (1998)
Lin, C.Y., Hovy, E.: Identifying Topics by Position. In: The Proceeding of The Workshop of Intelligent Scalable Text Summarization 1997 (1997)
Lin, C.Y.: Knowledge-based Automatic Topic Identification. In: The Proceeding of The 33rd Annual Meeting of the Association for Computational Linguistics 1995 (1995)
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, Y.A.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. In: Proceeding of the 15th Conference on Machine Learning (ICML-1998) (1998)
Quek, C.Y., Mitchell, T.: Classification of World Wide Web Documents. Seniors Honors Thesis, School of Computer Science, Carnegie Melon University (1998)
Scott, S., Matwin, S.: Text Classification using WordNet Hypernyms. In: The Proceeding of Workshop – Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kong, H., Hwang, M., Hwang, G., Shim, J., Kim, P. (2006). Topic Selection of Web Documents Using Specific Domain Ontology. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_100
Download citation
DOI: https://doi.org/10.1007/11925231_100
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)