Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Hierarchical Bayesian Inference and Recursive Regularization for Large-Scale Classification

Published: 13 April 2015 Publication History

Abstract

In this article, we address open challenges in large-scale classification, focusing on how to effectively leverage the dependency structures (hierarchical or graphical) among class labels, and how to make the inference scalable in jointly optimizing all model parameters. We propose two main approaches, namely the hierarchical Bayesian inference framework and the recursive regularization scheme. The key idea in both approaches is to reinforce the similarity among parameter across the nodes in a hierarchy or network based on the proximity and connectivity of the nodes. For scalability, we develop hierarchical variational inference algorithms and fast dual coordinate descent training procedures with parallelization. In our experiments for classification problems with hundreds of thousands of classes and millions of training instances with terabytes of parameters, the proposed methods show consistent and statistically significant improvements over other competing approaches, and the best results on multiple benchmark datasets for large-scale classification.

References

[1]
P. N. Bennett and N. Nguyen. 2009. Refined experts: improving classification in large taxonomies. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 11--18.
[2]
C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.
[3]
C. M. Bishop and M. E. Tipping. 2003. Bayesian regression and classification. In Advances in Learning Theory: Methods, Models, and Applications. IOS Press, 267--285.
[4]
G. Bouchard. 2007. Efficient bounds for the softmax function and applications to inference in hybrid models. In Proceedings of the NIPS 2007 Workshop for Approximate Bayesian Inference in Continuous/Hybrid Systems.
[5]
L. Cai and T. Hofmann. 2004. Hierarchical document categorization with support vector machines. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. 78--87.
[6]
G. Casella. 1988. Empirical Bayes Method—A Tutorial. Technical Report #88-18. Cornell University and Purdue University.
[7]
N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. 2006. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research 7, 31--54.
[8]
C. DeCoro, Z. Barutcuoglu, and R. Fiebrink. 2007. Bayesian aggregation for hierarchical genre classification. In Proceedings of the International Conference on Music Information Retrieval. 77--80.
[9]
O. Dekel, J. Keshet, and Y. Singer. 2004. Large margin hierarchical classification. In Proceedings of the 21st International Conference on Machine Learning. 27.
[10]
I. Dimitrovski, D. Kocev, L. Suzana, and S. Džeroski. 2008. Hierchical annotation of medical images. Pattern Recognition 44, 10--11, 2436--2449.
[11]
S. Dumais and H. Chen. 2000. Hierarchical classification of Web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 256--263.
[12]
A. Gelman. 2006. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1, 3, 515--533.
[13]
S. Gopal and Y. Yang. 2010. Multilabel classification with meta-level features. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 315--322.
[14]
S. Gopal and Y. Yang. 2013. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 257--265.
[15]
S. Gopal, Y. Yang, B. Bai, and A. Niculescu-Mizil. 2012. Bayesian models for large-scale hierarchical classification. In Advances in Neural Information Processing Systems 25, 2420--2428.
[16]
S. Gopal, Y. Yang, and A. Niculescu-Mizil. 2012. A regularization framework for large-scale hierarchical classification. In Proceedings of the Workshop on Large-Scale Hierarchical Text Classification.
[17]
N. Gornitz, C. K. Widmer, G. Zeller, A. Kahles, G. Ratsch, and S. Sonnenburg. 2011. Hierarchical multitask structured output learning for large-scale sequence segmentation. In Advances in Neural Information Processing Systems. 2690--2698.
[18]
C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. 2008. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning. 408--415.
[19]
R. E. Kass and R. Natarajan. 2006. A default conjugate prior for variance components in generalized linear mixed models. Bayesian Analysis 1, 3, 535--542.
[20]
D. Koller and M. Sahami. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. 170--178.
[21]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. 2004. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 5:361--397.
[22]
D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. 1996. Training algorithms for linear text classifiers. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 298--306.
[23]
D. C. Liu and J. Nocedal. 1989. On the limited memory BFGS method for large scale optimization. Mathematical Programming 45, 1, 503--528.
[24]
T. Y. Liu, Y. Yang, H. Wan, H. J. Zeng, Z. Chen, and W. Y. Ma. 2005. Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explorations Newsletter 7, 1, 36--43.
[25]
Z. Q. Luo and P. Tseng. 1992. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications 72, 1, 7--35.
[26]
A. McCallum, R. Rosenfeld, T. Mitchell, and A. Y. Ng. 1998. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning. 359--367.
[27]
J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. 2006. Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research 7, 1601--1626.
[28]
B. Shahbaba and R. M. Neal. 2007. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis 2, 1, 221--238.
[29]
M. E. Tipping. 2001. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211--244.
[30]
P. Tseng. 2001. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications 109, 3, 475--494.
[31]
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. 2006. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, 2, 1453.
[32]
C. Widmer, J. Leiva, Y. Altun, and G. Rätsch. 2010. Leveraging sequence classification by taxonomy-based multitask learning. In Research in Computational Molecular Biology. Springer, 522--534.
[33]
IPC WIPO. Download and IT Support Area. Retrieved December 28, 2014, from http://www.wipo.int/classifications/ipc/en/ITsupport.
[34]
G. R. Xue, D. Xing, Q. Yang, and Y. Yu. 2008. Deep classification in large-scale text hierarchies. In Proceedings of the 31st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. 619--626.
[35]
Y. Yang. 2001. A study of thresholding strategies for text categorization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 137--145.
[36]
Y. Yang, J. Zhang, and B. Kisiel. 2003. A scalability analysis of classifiers in text categorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 96--103.
[37]
Y. Yang. 1999. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1, 67--88.
[38]
D. Zhou, L. Xiao, and M. Wu. 2011. Hierarchical Classification via Orthogonal Transfer. Technical Report MSR-TR-2011-54. Microsoft Research, Redmond, WA.

Cited By

View all
  • (2024)Partial label learning for automated classification of single-cell transcriptomic profilesPLOS Computational Biology10.1371/journal.pcbi.101200620:4(e1012006)Online publication date: 5-Apr-2024
  • (2024)SDHC: Joint Semantic-Data Guided Hierarchical Classification for Fine-Grained HRRP Target RecognitionIEEE Transactions on Aerospace and Electronic Systems10.1109/TAES.2024.337337860:4(3993-4009)Online publication date: Aug-2024
  • (2024)Sparse Feature-Persistent Hierarchical ClassificationNAECON 2024 - IEEE National Aerospace and Electronics Conference10.1109/NAECON61878.2024.10670617(147-152)Online publication date: 15-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 3
TKDD Special Issue (SIGKDD'13)
April 2015
313 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2737800
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2015
Accepted: 01 April 2014
Revised: 01 February 2014
Received: 01 October 2013
Published in TKDD Volume 9, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian methods
  2. Large-scale optimization
  3. hierarchical classification

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Gordon and Betty Moore Foundation in the eScience project
  • NSF under award CCF_1019104
  • National Science Foundation (NSF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Partial label learning for automated classification of single-cell transcriptomic profilesPLOS Computational Biology10.1371/journal.pcbi.101200620:4(e1012006)Online publication date: 5-Apr-2024
  • (2024)SDHC: Joint Semantic-Data Guided Hierarchical Classification for Fine-Grained HRRP Target RecognitionIEEE Transactions on Aerospace and Electronic Systems10.1109/TAES.2024.337337860:4(3993-4009)Online publication date: Aug-2024
  • (2024)Sparse Feature-Persistent Hierarchical ClassificationNAECON 2024 - IEEE National Aerospace and Electronics Conference10.1109/NAECON61878.2024.10670617(147-152)Online publication date: 15-Jul-2024
  • (2024)Online hierarchical streaming feature selection based on adaptive neighborhood rough setApplied Soft Computing10.1016/j.asoc.2024.111276(111276)Online publication date: Jan-2024
  • (2023)Multi-Label Classification in Anime Illustrations Based on Hierarchical Attribute RelationshipsSensors10.3390/s2310479823:10(4798)Online publication date: 16-May-2023
  • (2023)Semantic guided level-category hybrid prediction network for hierarchical image classificationInternational Journal of Wavelets, Multiresolution and Information Processing10.1142/S021969132350023621:06Online publication date: 20-May-2023
  • (2023)Hierarchical Multi-Label Attribute Classification With Graph Convolutional Networks on Anime IllustrationIEEE Access10.1109/ACCESS.2023.326572811(35447-35456)Online publication date: 2023
  • (2023)Online feature selection for hierarchical classification learning based on improved ReliefFConcurrency and Computation: Practice and Experience10.1002/cpe.784435:27Online publication date: 20-Jul-2023
  • (2022)Hierarchical Semantic Risk Minimization for Large-Scale ClassificationIEEE Transactions on Cybernetics10.1109/TCYB.2021.305963152:9(9546-9558)Online publication date: Sep-2022
  • (2021)MATCH: Metadata-Aware Text Classification in A Large HierarchyProceedings of the Web Conference 202110.1145/3442381.3449979(3246-3257)Online publication date: 19-Apr-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media