Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/345508.345593acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free access

Hierarchical classification of Web content

Published: 01 July 2000 Publication History

Abstract

This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content. The hierarchical structure is initially used to train different second-level classifiers. In the hierarchical case, a model is learned to distinguish a second-level category from other categories within the same top level. In the flat non-hierarchical case, a model distinguishes a second-level category from all other second-level categories. Scoring rules can further take advantage of the hierarchy by considering only second-level categories that exceed a threshold at the top level.
We use support vector machine (SVM) classifiers, which have been shown to be efficient and effective for classification, but not previously explored in the context of hierarchical classification. We found small advantages in accuracy for hierarchical models over flat models. For the hierarchical approach, we found the same accuracy using a sequential Boolean decision rule and a multiplicative decision rule. Since the sequential approach is much more efficient, requiring only 14%-16% of the comparisons used in the other approaches, we find it to be a good choice for classifying text into large hierarchical structures.

References

[1]
Apte, C., Damerau, F. and Weiss, S. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233-251, 1994.]]
[2]
Chakrabarti, S., Dom, B., Agrawal, R. and Raghavan, P. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB Journal 7, 163-178, 1998.]]
[3]
Chen, H. and Dumais, S. Bringing order to the web: Automatically categorizing search results. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems ( CHI'O0), 145-152, 2000.]]
[4]
Cohen, W.W. and Singer, Y. Context-sensitive learning methods for text categorization Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'96), 307-315, 1996.]]
[5]
Cover, T. and Thomas, J. Elements of Information Theory. Wiley, 1991.]]
[6]
D'Alessio, S., Murray, M., Schiaffino, R. and Kershenbaum, A. Category levels in hierarchical text categorization. Proceedings of EMNLP-3, 3rd Conference on Empirical Methods in Natural Language Processing, 1998.]]
[7]
Dumais, S. T., Platt, J., Heckerman, D. and Sahami, M. Inductive learning algorithms and representations for text categorization. Proceedings of the Seventh International Conference on Information and Knowledge Management ( CIKM'98 ), 148-155, 1998.]]
[8]
Fuhr, N., Hartmanna, S., Lustig, G., Schwantner, M., and Tzeras, K. Air/X - A rule-based multi-stage indexing system for lage subject fields. Proceedings of RIAO'91,606-623, 1991.]]
[9]
Hayes, P.J. and Weinstein, S.P. CONSTRUE: A System for Content-Based Indexing of a Database of News Stories. Second Annual Conference on Innovative Applications of Artificial Intelligence, 1990.]]
[10]
Hearst, M., and Karadi, C. Searching and browsing text collections with large category hierarchies. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'97), Conference Companion, 1997.]]
[11]
Hearst, M. and Pedersen, J. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. Proceedings of 19 th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR'96), 1996.]]
[12]
Joachims, T. Text categorization with support vector machines: Learning with many relevant features. Proceedings of European Conference on Machine Learning (ECML '98), 1998]]
[13]
Koller, D. and Sahami, M. 1997. Hierarchically classifying documents using very few words. Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), 170-178, 1997.]]
[14]
Landauer, T., Egan, D., Remde, J., Lesk, M., Lochbaum, C., and Ketchum, D. Enhancing the usability of text through computer delivery and formative evaluation: The SuperBook project. Hypertext -A Psychological Perspective. Ellis Horwood, 1993.]]
[15]
Larkey, L. Some issues in the automatic classification of U.S. patents. In Working Notes for the AAAI-98 Workshop on Learning for Text Categorization, 1998.]]
[16]
Lewis, D.D. and Ringuette, M. A comparison of two learning algorithms for text categorization. Third Annual Symposium on Document Analysis and Information Retrieval ( SDAIR'94 ), 81-93, 1994.]]
[17]
McCallum, A., Rosenfeld, R., Mitchell, T. and Ng, A. Improving text classification by shrinkage in a hierarchy of classes. Proceedings of the Fifteenth International Conference on Machine Learning, (ICML-98), 359-367, 1998.]]
[18]
Mladenic, D. and Grobelnik, M. Feature selection for classification based on text hierarchy. Proceedings of the Workshop on Learning from Text and the Web, 1998.]]
[19]
Ng, H.T., Goh, W.B. and Low, K.L, Proceedings of 20 th Annual International ACM SIG1R Conference on Research and Development in Information Retrieval (SIGIR'97), 67-73, 1997.]]
[20]
Platt, J. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods -Support Vector Learning. B. Schtilkopf, C. Burges, and A. Smola, eds., MIT Press, 1999.]]
[21]
Ruiz, M.E. and Srinivasan, P. Hierarchical neural networks for text categorization. Proceedings of the 22nd International A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), 281-282, 1999.]]
[22]
Schiitze, H., Hull, D. and Pedersen, J.O. A comparison of classifiers and document representations for the routing problem. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), 229- 237, 1995.]]
[23]
Vapnik, V., Estimation of Dependencies Based on Data {in Russian}, Nauka, Moscow, 1979. (English translation: Springer Verlag, 1982.)]]
[24]
Vapnik, V., The Nature of Statistical Learning Theory, Springer-Verlag, 1995.]]
[25]
Weigend, A.S., Wiener, E.D. and Pedersen, J.O. Exploiting hierarchy in text categorization. Information Retrieval, 1(3), 193-216, 1999.]]
[26]
Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), 13- 22, 1994.]]
[27]
Yang, Y. and Lui, Y. A re-examination of text categorization methods. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), 42-49, 1999.]]
[28]
Yang, Y. and Pedersen, J.O. A comparative study on feature selection in text categorization. Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), 412-420, 1997.]]
[29]
Zamir, O. and Etzioni, O. Web document clustering: A feasibility demonstration. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98), 46-54, 1998.]]
[30]
http://www.looksmart.com]]

Cited By

View all
  • (2024)Local Hierarchy-Aware Text-Label Association for Hierarchical Text Classification2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA61799.2024.10722840(1-10)Online publication date: 6-Oct-2024
  • (2024)Single-pass Hierarchical Text Classification with Large Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825412(5412-5421)Online publication date: 15-Dec-2024
  • (2024)Image-based novel fault detection with deep learning classifiers using hierarchical labelsIISE Transactions10.1080/24725854.2024.232606856:10(1112-1130)Online publication date: 2-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
July 2000
396 pages
ISBN:1581132263
DOI:10.1145/345508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2000

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Web hierarchies
  2. classification
  3. hierarchical models
  4. machine learning
  5. support vector machines
  6. text catergorization
  7. text classification

Qualifiers

  • Article

Conference

SIGIR00
Sponsor:
  • Greek Com Soc
  • SIGIR
  • Athens U of Econ & Business

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)522
  • Downloads (Last 6 weeks)68
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Local Hierarchy-Aware Text-Label Association for Hierarchical Text Classification2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA61799.2024.10722840(1-10)Online publication date: 6-Oct-2024
  • (2024)Single-pass Hierarchical Text Classification with Large Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825412(5412-5421)Online publication date: 15-Dec-2024
  • (2024)Image-based novel fault detection with deep learning classifiers using hierarchical labelsIISE Transactions10.1080/24725854.2024.232606856:10(1112-1130)Online publication date: 2-May-2024
  • (2024)Prompt tuning discriminative language models for hierarchical text classificationNatural Language Processing10.1017/nlp.2024.51(1-18)Online publication date: 10-Oct-2024
  • (2024)Incremental classification of remote sensing images using feature pyramid and class hierarchy enhanced by label relationship graphsApplied Intelligence10.1007/s10489-024-06216-055:3Online publication date: 28-Dec-2024
  • (2024)HLC: hierarchically-aware label correlation for hierarchical text classificationApplied Intelligence10.1007/s10489-023-05257-154:2(1602-1618)Online publication date: 1-Jan-2024
  • (2024)Re-thinking Human Activity Recognition with Hierarchy-Aware Label Relationship ModelingAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2262-4_1(3-14)Online publication date: 25-Apr-2024
  • (2024)Text Categorization: Conceptual ViewText Mining10.1007/978-3-031-75976-5_5(81-102)Online publication date: 8-Oct-2024
  • (2024)Modeling Text-Label Alignment for Hierarchical Text ClassificationMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70365-2_10(163-179)Online publication date: 8-Sep-2024
  • (2023)Label Hierarchy Alignment for Improved Hierarchical Text Classification2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386495(1174-1179)Online publication date: 15-Dec-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media