Abstract
Conventional automatic text annotation tools mostly extract named entities from texts and annotate them with information about persons, locations, and dates, etc. Such kind of entity type information, however, is insufficient for machines to understand the context or facts contained in the texts. This paper presents a general text categorization approach to categorize text segments into broader subject categories, such as categorizing a text string into a category of paper title in Mathematics or a category of conference name in Computer Science. Experimental results confirm its wide applicability to various digital library applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Witten, I.H., et al.: Text Mining in a Digital Library. International Journal on Digital Libraries 4(1), 56–59 (2004)
Zhou, G.D., Su, J.: Named Entity Recognition Using an HMM-based Chunk Tagger. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 473–480 (2000)
Hearst, M.: Untangling Text Data Mining. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (1999)
Banko, M., Brill, E.: Scaling to Very Large Corpora for Natural Language Disambiguation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 26–33 (2001)
Cohen, W., Singer, Y.: Context-sensitive Learning Methods for Text Categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–315 (2001)
Huang, C.C., Chuang, S.L., Chien, L.F.: LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora. In: Proceedings of the 2004 World Wide Web Conference, WWW 2004 (2004)
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD Explorations 2(1), 1–15 (2000)
Feldman, R., et al.: Maximal Association Rules: A New Tool for Mining for Keyword Co-occurrences in Document Collections. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 167–170 (1997)
Soderland, S.: Learning Text Analysis Rules for Domain-specific Natural Language Processing. Ph.D. thesis, technical report UM-CS-1996-087 University of Massachusetts, Amherst (1997)
Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching Very Large Ontology Using the WWW. In: Proceedings of ECAI 2000 Workshop on Ontology Learning (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chiao, HC., Pu, HT., Chien, LF. (2005). Annotating Text Segments Using a Web-Based Categorization Approach. In: Fox, E.A., Neuhold, E.J., Premsmit, P., Wuwongse, V. (eds) Digital Libraries: Implementing Strategies and Sharing Experiences. ICADL 2005. Lecture Notes in Computer Science, vol 3815. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11599517_37
Download citation
DOI: https://doi.org/10.1007/11599517_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30850-8
Online ISBN: 978-3-540-32291-7
eBook Packages: Computer ScienceComputer Science (R0)