research-article

FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning

Authors:

Yashoteja Prabhu,

Manik VarmaAuthors Info & Claims

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 263 - 272

https://doi.org/10.1145/2623330.2623651

Published: 24 August 2014 Publication History

Abstract

The objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem since not only does it enable the tackling of applications with many labels but it also allows the reformulation of ranking problems with certain advantages over existing formulations. Our objective, in this paper, is to develop an extreme multi-label classifier that is faster to train and more accurate at prediction than the state-of-the-art Multi-label Random Forest (MLRF) algorithm [2] and the Label Partitioning for Sub-linear Ranking (LPSR) algorithm [35]. MLRF and LPSR learn a hierarchy to deal with the large number of labels but optimize task independent measures, such as the Gini index or clustering error, in order to learn the hierarchy. Our proposed FastXML algorithm achieves significantly higher accuracies by directly optimizing an nDCG based ranking loss function. We also develop an alternating minimization algorithm for efficiently optimizing the proposed formulation. Experiments reveal that FastXML can be trained on problems with more than a million labels on a standard desktop in eight hours using a single core and in an hour using multiple cores.

Supplementary Material

MP4 File (p263-sidebyside.mp4)

Download
248.68 MB

References

[1]

Wikipedia dataset for the 4th large scale hierarchical text classification challenge. http://lshtc.iit.demokritos.gr/.

[2]

R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW, pages 13--24, 2013.

Digital Library

[3]

G. Andrew and J. Gao. Scalable training of L1-regularized log-linear models. In ICML, pages 33--40, 2007.

Digital Library

[4]

K. Balasubramanian and G. Lebanon. The landmark selection method for multiple output prediction. In ICML, 2012.

Digital Library

[5]

S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In NIPS, 2010.

Digital Library

[6]

D. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.

[7]

W. Bi and J. T.-Y. Kwok. Multilabel classification on tree- and dag-structured hierarchies. In ICML, 2011.

[8]

W. Bi and J. T.-Y. Kwok. Efficient multi-label classification with many labels. In ICML, pages 405--413, 2013.

Digital Library

[9]

N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. JMLR, 7, 2006.

Digital Library

[10]

Y.-N. Chen and H.-T. Lin. Feature-aware label space dimension reduction for multi-label classification. In NIPS, pages 1538--1546, 2012.

Digital Library

[11]

A. Choromanska and J. Langford. Logarithmic time online multiclass prediction. http://arxiv.org/abs/1406.1822, 2014.

[12]

M. Cisse, N. Usunier, T. Artieres, and P. Gallinari. Robust bloom filters for large multilabel classification tasks. In NIPS, pages 1851--1859, 2013.

[13]

J. Deng, S. Satheesh, A. C. Berg, and F. Li. Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS, 2011.

Digital Library

[14]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 9:1871--1874, 2008.

Digital Library

[15]

C.-S. Feng and H.-T. Lin. Multi-label classification with error-correcting codes. JMLR, pages 289--295, 2011.

[16]

T. Gao and D. Koller. Discriminative learning of relaxed hierarchy for large-scale visual recognition. In ICCV, pages 2072--2079, 2011.

Digital Library

[17]

P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. ML, pages 3--42, 2006.

Digital Library

[18]

B. Hariharan, S. V. N. Vishwanathan, and M. Varma. Efficient max-margin multi-label classification with applications to zero-shot learning. ML, 2012.

Digital Library

[19]

D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In NIPS, 2009.

Digital Library

[20]

S. Ji, L. Tang, S. Yu, and J. Ye. Extracting shared subspace for multi-label classification. In KDD, pages 381--389, 2008.

Digital Library

[21]

C. Jose, P. Goyal, P. Aggrwal, and M. Varma. Local deep kernel learning for efficient non-linear svm prediction. In ICML, June 2013.

Digital Library

[22]

A. Kapoor, R. Viswanathan, and P. Jain. Multilabel classification using bayesian compressed sensing. In NIPS, 2012.

Digital Library

[23]

I. Katakis, G. Tsoumakas, and I. Vlahavas. Multilabel text classification for automated tag suggestion. In ECML/PKDD Discovery Challenge, 2008.

[24]

K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l1-regularized logistic regression. JMLR, 8:1519--1555, 2007.

Digital Library

[25]

A. Kustarev, Y. Ustinovsky, Y. Logachev, E. Grechnikov, I. Segalovich, and P. Serdyukov. Smoothing ndcg metrics using tied scores. In CIKM, pages 2053--2056, 2011.

Digital Library

[26]

P. D. Ravikumar, A. Tewari, and E. Yang. On ndcg consistency of listwise ranking methods. In AISTATS, pages 618--626, 2011.

[27]

J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. JMLR, 7, 2006.

Digital Library

[28]

C. Snoek, M. Worring, J. van Gemert, J.-M. Geusebroek, and A. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In ACM Multimedia, pages 421--430, 2006.

Digital Library

[29]

F. Tai and H.-T. Lin. Multi-label classification with principal label space transformation. In Workshop proceedings of learning from multi-label data, 2010.

[30]

G. Tsoumakas, I. Katakis, and I. Vlahavas. Effective and efficient multilabel classification in domains with large number of labels. In ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.

[31]

H. Valizadegan, R. Jin, R. Zhang, and J. Mao. Learning to rank by optimizing ndcg measure. In SIGIR, pages 41--48, 2000.

[32]

M. N. Volkovs and R. S. Zemel. Boltzrank: Learning to maximize expected ranking gain. In ICML, pages 1089--1096, 2009.

Digital Library

[33]

Y. Wang, L. Wang, Y. Li, D. He, and T.-Y. Liu. A theoretical analysis of nDCG type ranking measures. In COLT, pages 25--54, 2013.

[34]

J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, 2011.

Digital Library

[35]

J. Weston, A. Makadia, and H. Yee. Label partitioning for sublinear ranking. In ICML, volume 28, pages 181--189, 2013.

[36]

H.-F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale multi-label learning with missing labels. ICML, 2014.

Digital Library

[37]

G.-X. Yuan, C.-H. Ho, and C.-J. Lin. An improved glmnet for l1-regularized logistic regression. JMLR, 13:1999--2030, 2012.

Digital Library

[38]

Y. Zhang and J. G. Schneider. Multi-label output codes using canonical correlation analysis. In AISTATS, pages 873--882, 2011.

Cited By

de Campos LFernández-Luna JHuete JRibadas-Pena FBolaños N(2024)Information Retrieval and Machine Learning Methods for Academic Expert FindingAlgorithms10.3390/a1702005117:2(51)Online publication date: 23-Jan-2024
https://doi.org/10.3390/a17020051
Ostapuk NAudiffren JDolamic LMermoud ACudré-Mauroux PChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Follow the Path: Hierarchy-Aware Extreme Multi-Label Completion for Semantic Text TaggingProceedings of the ACM Web Conference 202410.1145/3589334.3645558(2094-2105)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645558
Ye HSunderraman RJi S(2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
https://doi.org/10.1109/TKDE.2024.3374750
Show More Cited By

Index Terms

FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning
1. Computing methodologies
  1. Machine learning

Recommendations

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label set. Unfortunately, existing loss functions, such as the ...
DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very ...
Capturing correlations of multiple labels: A generative probabilistic model for multi-label learning

Recent years have witnessed a considerable surge of interest in the multi-label learning problem. It has been shown that a key factor for a successful multi-label learning algorithm is to effectively exploit relations between labels. However, most of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2014

2028 pages

ISBN:9781450329569

DOI:10.1145/2623330

General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

TCS Phd fellowship

Conference

KDD '14

Sponsor:

KDD '14: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 24 - 27, 2014

New York, New York, USA

Acceptance Rates

KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

192
Total Citations
View Citations
1,502
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)5

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

de Campos LFernández-Luna JHuete JRibadas-Pena FBolaños N(2024)Information Retrieval and Machine Learning Methods for Academic Expert FindingAlgorithms10.3390/a1702005117:2(51)Online publication date: 23-Jan-2024
https://doi.org/10.3390/a17020051
Ostapuk NAudiffren JDolamic LMermoud ACudré-Mauroux PChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Follow the Path: Hierarchy-Aware Extreme Multi-Label Completion for Semantic Text TaggingProceedings of the ACM Web Conference 202410.1145/3589334.3645558(2094-2105)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645558
Ye HSunderraman RJi S(2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
https://doi.org/10.1109/TKDE.2024.3374750
Peng CWang HWang JShou LChen KChen GYao C(2024)Learning Label-Adaptive Representation for Large-Scale Multi-Label Text ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.339372232(2630-2640)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3393722
Ostapuk NDolamic LMermoud ACudré-Mauroux P(2024)Leveraging Pre-Trained Extreme Multi-Label Classifiers for Zero-Shot Learning2024 11th IEEE Swiss Conference on Data Science (SDS)10.1109/SDS60720.2024.00041(233-236)Online publication date: 30-May-2024
https://doi.org/10.1109/SDS60720.2024.00041
Mylonas NMollas IBassiliades NTsoumakas G(2024)Exploring local interpretability in dimensionality reductionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124074252:PAOnline publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.124074
Fan JZhang HMin F(2024)Learning cluster-wise label distribution for label enhancementInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02343-9Online publication date: 27-Aug-2024
https://doi.org/10.1007/s13042-024-02343-9
Wang ZXu QYang ZWen PHe YCao XHuang Q(2024)Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-label ClassificationInternational Journal of Computer Vision10.1007/s11263-024-02157-wOnline publication date: 26-Jul-2024
https://doi.org/10.1007/s11263-024-02157-w
Zhao FAi QLi XWang WGao QLiu Y(2024)TLC-XML: Transformer with Label Correlation for Extreme Multi-label Text ClassificationNeural Processing Letters10.1007/s11063-024-11460-z56:1Online publication date: 10-Feb-2024
https://doi.org/10.1007/s11063-024-11460-z
Zhao FTao RWang WCui BXu YAi Q(2024)Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learningApplied Intelligence10.1007/s10489-024-05498-854:8(6285-6298)Online publication date: 9-May-2024
https://doi.org/10.1007/s10489-024-05498-8
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents