Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down hierarchical classification with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i.e., defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art hierarchical classification approaches. Source Code: https://cs.gmu.edu/~mlbio/TaxMod/

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://geneontology.org/

  2. http://www.wipo.int/patentscope/en/

  3. http://www.image-net.org/

  4. http://dir.yahoo.com/

  5. http://lshtc.iit.demokritos.gr/

  6. http://image-net.org/

  7. http://bioasq.org/

  8. http://qwone.com/~jason/20Newsgroups/

  9. http://www.wipo.int/classifications/ipc/en/

  10. http://dmoz.org/

  11. http://lshtc.iit.demokritos.gr/

  12. http://lshtc.iit.demokritos.gr/node/81

  13. http://lshtc.iit.demokritos.gr/LSHTC3_oracleUpload

  14. http://www.csie.ntu.edu.tw/~cjlin/liblinear/

References

  • Aggarwal, C., Gates, S., Yu, P. (1999). On the merits of building categorization systems by supervised clustering. In SIGKDD (pp. 352–356).

  • Babbar, R., Partalas, I., Gaussier, E., Amini, M. (2013a). On flat versus hierarchical classification in large-scale taxonomies. In NIPS (pp. 1824–1832).

  • Babbar, R., Partalas, I., Gaussier, E., Amini, M. R. (2013b). Maximum-margin framework for training data synchronization in large-scale hierarchical classification. In Neural Information Processing (pp. 336–343).

  • Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In CIKM (pp. 78–87).

  • Charuvaka, A., & Rangwala, H. (2015). Hiercost: Improving large scale hierarchical classification with cost sensitive learning. In ECML PKDD.

  • Chuang, S., & Chien, L. (2004). A practical web-based approach to generating topic hierarchy for text segments. In CIKM (pp. 127–136).

  • Dimitrovski, I., Kocev, D., Loskovska, S., džeroski, S. (2011). Hierarchical annotation of medical images. Pattern Recognition, 44(10), 2436–2449.

    Article  Google Scholar 

  • Dimitrovski, I., Kocev, D., Loskovska, S., Džeroski, S. (2012). Hierarchical classification of diatom images using predictive clustering trees. Ecological Informatics, 7, 19–29.

    Article  Google Scholar 

  • Dumais, S., & Chen, H. (2000). Hierarchical classification of web content. In ACM SIGIR (pp. 256–263).

  • Gao, T., & Koller, D. (2011). Discriminative learning of relaxed hierarchy for large-scale visual recognition. In ICCV (pp. 2072–2079).

  • Gopal, S., & Yang, Y. (2013). Recursive regularization for large-scale classification with hierarchical & graphical dependencies. In ACM SIGKDD (pp. 257–265).

  • Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. in ICML (pp. 170–178).

  • Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., Androutsopoulos, I. (2015). Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Mining and Knowledge Discovery, 29(3), 820–865.

    Article  MathSciNet  Google Scholar 

  • Li, T., Zhu, S., Ogihara, M. (2007). Hierarchical document classification using automatically generated hierarchy. JIIS, 29(2), 211–230.

    Google Scholar 

  • Liu, T., Wan, H., Qin, T., Chen, Z., Ren, Y., Ma, W. (2005). Site abstraction for rare category classification in large-scale web directory. In WWW: Special interest tracks & posters (pp. 1108–1109).

  • Malik, H. (2010). Improving hierarchical svms by hierarchy flattening and lazy classification. In Large-Scale HC Workshop of ECIR.

  • McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A. (1998). Improving text classification by shrinkage in a hierarchy of classes. In ICML (pp. 359–367).

  • Naik, A., & Rangwala, H. (2016a). Filter based taxonomy modification for improving hierarchical classification. arXiv:1603.00772.

  • Naik, A., & Rangwala, H. (2016b). Inconsistent node flattening for improving top-down hierarchical classification. In IEEE DSAA (pp. 379–388).

  • Naik, A., & Rangwala, H. (2017a). Hierflat: flattened hierarchies for improving top-down hierarchical classification. International Journal of Data Science and Analytics, 4(3), 191–208.

    Article  Google Scholar 

  • Naik, A., & Rangwala, H. (2017b). Integrated framework for improving large-scale hierarchical classification. In 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 281–288).

  • Nitta, K. (2010). Improving taxonomies for large-scale hierarchical classifiers of web docs. In CIKM (pp. 1649–1652).

  • Punera, K., Rajan, S., Ghosh, J. (2005). Automatically learning document taxonomies for hierarchical classification. In WWW: Special interest tracks & posters.

  • Qi, X., & Davison, B. (2011). Hierarchy evolution for improved classification. In CIKM (pp. 2193–2196).

  • Silla, C.N., Jr., & Freitas, A.A. (2011). A survey of hierarchical classification across different application domains. DMKD, 22(1-2), 31–72.

    MathSciNet  MATH  Google Scholar 

  • Steinbach, M., Ertöz, L., Kumar, V. (2004). The challenges of clustering high dimensional data. in new directions in statistical physics (pp. 273–309).

  • Sun, A., & Lim, E. (2001). Hierarchical text classification and evaluation. In ICDM (pp. 521–528).

  • Tang, L., Zhang, J., Liu, H. (2006). Acclimatizing taxonomic semantics for hierarchical content classification. In ACM SIGKDD (pp. 384–393).

  • Vens, C., Struyf, J., Schietgat, L., džeroski, S., Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185–214.

    Article  Google Scholar 

  • Victor, G. S., Antonia, P., Spyros, S. (2014). Csmr: A scalable algorithm for text clustering with cosine similarity and mapreduce. In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 211–220): Springer.

  • Wang, X., & Lu, B. (2010). Flatten hierarchies for large-scale hierarchical text categorization. In ICDIM (pp. 139–144).

  • Wang, J., Shen, H.T., Song, J., Ji, J. (2014). Hashing for similarity search: A survey. arXiv:1408.2927.

  • Xiao, L., Zhou, D., Wu, M. (2011). Hierarchical classification via orthogonal transfer. In ICML (pp. 801–808).

  • Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In ACM SIGIR (42–49).

  • Zimek, A., Buchwald, F., Frank, E., Kramer, S. (2010). A study of hierarchical and flat classification of proteins. IEEE/ACM TCBB, 7(3), 563–571.

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge support of the work described in this paper from the NSF grant #1447489 and #1252318.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Azad Naik.

Ethics declarations

Conflict of interests

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

A preliminary version of this paper was made available at ArXiv (Naik and Rangwala 2016a).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naik, A., Rangwala, H. Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach. J Intell Inf Syst 52, 141–164 (2019). https://doi.org/10.1007/s10844-018-0509-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-018-0509-4

Keywords