research-article

Public Access

Deep Learning for Extreme Multi-label Text Classification

Authors:

Wei-Cheng Chang,

Yiming YangAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 115 - 124

https://doi.org/10.1145/3077136.3080834

Published: 07 August 2017 Publication History

Abstract

Extreme multi-label text classification (XMTC) refers to the problem of assigning to each document its most relevant subset of class labels from an extremely large label collection, where the number of labels could reach hundreds of thousands or millions. The huge label space raises research challenges such as data sparsity and scalability. Significant progress has been made in recent years by the development of new machine learning methods, such as tree induction with large-margin partitions of the instance spaces and label-vector embedding in the target space. However, deep learning has not been explored for XMTC, despite its big successes in other related areas. This paper presents the first attempt at applying deep learning to XMTC, with a family of new Convolutional Neural Network (CNN) models which are tailored for multi-label classification in particular. With a comparative evaluation of 7 state-of-the-art methods on 6 benchmark datasets where the number of labels is up to 670,000, we show that the proposed CNN approach successfully scaled to the largest datasets, and consistently produced the best or the second best results on all the datasets. On the Wikipedia dataset with over 2 million documents and 500,000 labels in particular, it outperformed the second best method by 11.7%~15.3% in precision@K and by 11.5%~11.7% in NDCG@K for K = 1,3,5.

References

[1]

Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multilabel learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web. ACM, 13--24.

Digital Library

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[3]

Krishnakumar Balasubramanian and Guy Lebanon. 2012. The landmark selection method for multiple output prediction. arXiv preprint arXiv:1206.6479 (2012).

[4]

James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proc. 9th Python in Science Conf. 1--7.

[5]

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems. 730--738.

[6]

Wei Bi and James Tin-Yau Kwok. 2013. Efficient Multi-label Classification with Many Labels. In ICML (3). 405--413.

[7]

Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern recognition 37, 9 (2004), 1757--1771.

[8]

Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks. In ACL. 167--176.

[9]

Yao-Nan Chen and Hsuan-Tien Lin. 2012. Feature-aware label space dimension reduction for multi-label classification. In Advances in Neural Information Processing Systems. 1529--1537.

[10]

Moustapha M. Cisse, Nicolas Usunier, Thierry Artieres, and Patrick Gallinari. 2013. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems. 1851--1859.

[11]

Amanda Clare and Ross D. King. 2001. Knowledge discovery in multi-label phenotype data. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 42--53.

Digital Library

[12]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537.

Digital Library

[13]

André Elisseeff and Jason Weston. 2001. A kernel method for multi-labelled classification. In Advances in neural information processing systems. 681--687.

[14]

Chun-Sung Ferng and Hsuan-Tien Lin. 2011. Multi-label Classification with Error-correcting Codes. In ACML. 281--295.

[15]

Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, and Klaus Brinker. 2008. Multilabel classification via calibrated label ranking. Machine learning 73, 2 (2008), 133--153.

Digital Library

[16]

Sayan Ghosh, Eugene Laksana, Stefan Scherer, and Louis-Philippe Morency. 2015. A multi-label convolutional neural network approach to cross-domain action unit detection. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 609--615.

Digital Library

[17]

Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013).

[18]

Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 309--316.

[19]

Daniel Hsu, Sham Kakade, John Langford, and Tong Zhang. 2009. Multi-Label Prediction via Compressed Sensing. In NIPS, Vol. 22. 772--780.

Digital Library

[20]

Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. 2008. Extracting shared subspace for multi-label classification. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 381--389.

Digital Library

[21]

Rie Johnson and Tong Zhang. 2015. Effective use of word order for text categorization with convolutional neural networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie. 103--112.

[22]

Rie Johnson and Tong Zhang. 2015. Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in neural information processing systems. 919--927.

[23]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).

[24]

Ashish Kapoor, Raajay Viswanathan, and Prateek Jain. 2012. Multilabel classification using bayesian compressed sensing. In Advances in Neural Information Processing Systems. 2645--2653.

[25]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1746--1751.

[26]

Gakuto Kurata, Bing Xiang, and Bowen Zhou. 2016. Improved neural networkbased multi-label classification with better initialization leveraging label cooccurrence. In Proceedings of NAACL-HLT. 521--526.

[27]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In AAAI. 2267--2273.

[28]

Jure Leskovec and Andrej Krevl. 2015. {SNAP Datasets}:{Stanford} Large Network Dataset Collection. (2015).

[29]

David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research 5, Apr (2004), 361--397.

Digital Library

[30]

Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 165--172.

Digital Library

[31]

Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.

Digital Library

[32]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[33]

Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech, Vol. 2. 3.

[34]

Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, and Johannes Fürnkranz. 2014. Large-scale multi-label text classification fire visiting neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 437--452.

Digital Library

[35]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP, Vol. 14. 1532--1543.

[36]

Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 263--272.

Digital Library

[37]

Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), Vol. 1631. Citeseer, 1642.

[38]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

[39]

Farbound Tai and Hsuan-Tien Lin. 2012. Multilabel classification with principal label space transformation. Neural Computation 24, 9 (2012), 2508--2542.

Digital Library

[40]

Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling up to large vocabulary image annotation. (2011).

[41]

Jason Weston, Ameesh Makadia, and Hector Yee. 2013. Label Partitioning For Sublinear Ranking. In ICML (2). 181--189.

[42]

Yiming Yang and Siddharth Gopal. 2012. Multilabel classification with metalevel features in a learning-to-rank framework. Machine Learning 88, 1--2 (2012), 47--68.

Digital Library

[43]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[44]

Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, and Yu-Chiang Frank Wang. 2017. Learning Deep Latent Spaces for Multi-Label Classification. (2017).

[45]

Ian E. H. Yen, Xiangru Huang, Kai Zhong, Pradeep Ravikumar, and Inderjit S. Dhillon. 2016. PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. (2016).

[46]

Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit S. Dhillon. 2014. Largescale Multi-label Learning with Missing Labels. In Proceedings of the 31th International Conference on Machine Learning. 593--601.

[47]

Min-Ling Zhang and Zhi-Hua Zhou. 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE transactions on Knowledge and Data Engineering 18, 10 (2006), 1338--1351.

Digital Library

[48]

Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038--2048.

Digital Library

[49]

Rui Zhang, Honglak Lee, and Dragomir Radev. 2016. Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv preprint arXiv:1611.02361 (2016).

[50]

Yi Zhang and Jeff G. Schneider. 2011. Multi-Label Output Codes using Canonical Correlation Analysis. In AISTATS. 873--882.

[51]

Arkaitz Zubiaga. 2012. Enhancing navigation on wikipedia with social tags. arXiv preprint arXiv:1202.5469 (2012).

Cited By

Lv SDong JWang CWang XBao Z(2024)RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention NetworkSensors10.3390/s2411336524:11(3365)Online publication date: 24-May-2024
https://doi.org/10.3390/s24113365
Min HSon YChoi Y(2024)Determining Optimal Assembly Condition for Lens Module Production by Combining Genetic Algorithm and C-BLSTMProcesses10.3390/pr1203045212:3(452)Online publication date: 23-Feb-2024
https://doi.org/10.3390/pr12030452
Zangari AMarcuzzo MRizzo MGiudice LAlbarelli AGasparetto A(2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
https://doi.org/10.3390/electronics13071199
Show More Cited By

Index Terms

Deep Learning for Extreme Multi-label Text Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Learning latent representations
      2. Neural networks

Recommendations

Deep Extreme Multi-label Learning
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves 2^L possible label sets especially when the label dimension ...
Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

In graph-based semi-supervised learning approaches, the classification rate is highly dependent on the size of the availabel labeled data, as well as the accuracy of the similarity measures. Here, we propose a semi-supervised multi-class/multi-label ...
Instance Annotation for Multi-Instance Multi-Label Learning
Special Issue on ACM SIGKDD 2012

Multi-instance multi-label learning (MIML) is a framework for supervised classification where the objects to be classified are bags of instances associated with multiple labels. For example, an image can be represented as a bag of segments and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

370
Total Citations
View Citations
11,885
Total Downloads

Downloads (Last 12 months)1,803
Downloads (Last 6 weeks)256

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lv SDong JWang CWang XBao Z(2024)RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention NetworkSensors10.3390/s2411336524:11(3365)Online publication date: 24-May-2024
https://doi.org/10.3390/s24113365
Min HSon YChoi Y(2024)Determining Optimal Assembly Condition for Lens Module Production by Combining Genetic Algorithm and C-BLSTMProcesses10.3390/pr1203045212:3(452)Online publication date: 23-Feb-2024
https://doi.org/10.3390/pr12030452
Zangari AMarcuzzo MRizzo MGiudice LAlbarelli AGasparetto A(2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
https://doi.org/10.3390/electronics13071199
Wu TYang S(2024)Contrastive Enhanced Learning for Multi-Label Text ClassificationApplied Sciences10.3390/app1419865014:19(8650)Online publication date: 25-Sep-2024
https://doi.org/10.3390/app14198650
Ma BWu JYan W(2024)JudPriNet: Video Transition Detection Based on Semantic Relationship and Monte Carlo SamplingIntelligent and Converged Networks10.23919/ICN.2024.00105:2(134-146)Online publication date: Jun-2024
https://doi.org/10.23919/ICN.2024.0010
Shang ZChauhan VDevi KPatil S(2024)Artificial Intelligence, the Digital Surgeon: Unravelling Its Emerging Footprint in Healthcare – The Narrative ReviewJournal of Multidisciplinary Healthcare10.2147/JMDH.S482757Volume 17(4011-4022)Online publication date: Aug-2024
https://doi.org/10.2147/JMDH.S482757
Jing PLiu XZhang LLi YLiu YSu Y(2024)Multimodal Attentive Representation Learning for Micro-video Multi-label ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364388820:6(1-23)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3643888
Wang HPeng CDong HFeng LLiu WHu TChen KChen G(2024)On the Value of Head Labels in Multi-Label Text ClassificationACM Transactions on Knowledge Discovery from Data10.1145/364385318:5(1-21)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3643853
Mahindrakar SMondal TDhakne AArosh SBhattacharya I(2024)Performance Analysis of Tweet Summarization Techniques Considering Crisis DynamicsProceedings of the 25th International Conference on Distributed Computing and Networking10.1145/3631461.3631951(418-423)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3631461.3631951
Chen ZKerr ACai RKosaian JWu HDing YXie YTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)EVT: Accelerating Deep Learning Training with Epilogue Visitor TreeProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651369(301-316)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651369
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents