Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3077136.3080834acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Public Access

Deep Learning for Extreme Multi-label Text Classification

Published: 07 August 2017 Publication History

Abstract

Extreme multi-label text classification (XMTC) refers to the problem of assigning to each document its most relevant subset of class labels from an extremely large label collection, where the number of labels could reach hundreds of thousands or millions. The huge label space raises research challenges such as data sparsity and scalability. Significant progress has been made in recent years by the development of new machine learning methods, such as tree induction with large-margin partitions of the instance spaces and label-vector embedding in the target space. However, deep learning has not been explored for XMTC, despite its big successes in other related areas. This paper presents the first attempt at applying deep learning to XMTC, with a family of new Convolutional Neural Network (CNN) models which are tailored for multi-label classification in particular. With a comparative evaluation of 7 state-of-the-art methods on 6 benchmark datasets where the number of labels is up to 670,000, we show that the proposed CNN approach successfully scaled to the largest datasets, and consistently produced the best or the second best results on all the datasets. On the Wikipedia dataset with over 2 million documents and 500,000 labels in particular, it outperformed the second best method by 11.7%~15.3% in precision@K and by 11.5%~11.7% in NDCG@K for K = 1,3,5.

References

[1]
Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multilabel learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web. ACM, 13--24.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[3]
Krishnakumar Balasubramanian and Guy Lebanon. 2012. The landmark selection method for multiple output prediction. arXiv preprint arXiv:1206.6479 (2012).
[4]
James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU math compiler in Python. In Proc. 9th Python in Science Conf. 1--7.
[5]
Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems. 730--738.
[6]
Wei Bi and James Tin-Yau Kwok. 2013. Efficient Multi-label Classification with Many Labels. In ICML (3). 405--413.
[7]
Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern recognition 37, 9 (2004), 1757--1771.
[8]
Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks. In ACL. 167--176.
[9]
Yao-Nan Chen and Hsuan-Tien Lin. 2012. Feature-aware label space dimension reduction for multi-label classification. In Advances in Neural Information Processing Systems. 1529--1537.
[10]
Moustapha M. Cisse, Nicolas Usunier, Thierry Artieres, and Patrick Gallinari. 2013. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems. 1851--1859.
[11]
Amanda Clare and Ross D. King. 2001. Knowledge discovery in multi-label phenotype data. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 42--53.
[12]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537.
[13]
André Elisseeff and Jason Weston. 2001. A kernel method for multi-labelled classification. In Advances in neural information processing systems. 681--687.
[14]
Chun-Sung Ferng and Hsuan-Tien Lin. 2011. Multi-label Classification with Error-correcting Codes. In ACML. 281--295.
[15]
Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, and Klaus Brinker. 2008. Multilabel classification via calibrated label ranking. Machine learning 73, 2 (2008), 133--153.
[16]
Sayan Ghosh, Eugene Laksana, Stefan Scherer, and Louis-Philippe Morency. 2015. A multi-label convolutional neural network approach to cross-domain action unit detection. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 609--615.
[17]
Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013).
[18]
Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 309--316.
[19]
Daniel Hsu, Sham Kakade, John Langford, and Tong Zhang. 2009. Multi-Label Prediction via Compressed Sensing. In NIPS, Vol. 22. 772--780.
[20]
Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. 2008. Extracting shared subspace for multi-label classification. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 381--389.
[21]
Rie Johnson and Tong Zhang. 2015. Effective use of word order for text categorization with convolutional neural networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie. 103--112.
[22]
Rie Johnson and Tong Zhang. 2015. Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in neural information processing systems. 919--927.
[23]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).
[24]
Ashish Kapoor, Raajay Viswanathan, and Prateek Jain. 2012. Multilabel classification using bayesian compressed sensing. In Advances in Neural Information Processing Systems. 2645--2653.
[25]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1746--1751.
[26]
Gakuto Kurata, Bing Xiang, and Bowen Zhou. 2016. Improved neural networkbased multi-label classification with better initialization leveraging label cooccurrence. In Proceedings of NAACL-HLT. 521--526.
[27]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In AAAI. 2267--2273.
[28]
Jure Leskovec and Andrej Krevl. 2015. {SNAP Datasets}:{Stanford} Large Network Dataset Collection. (2015).
[29]
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research 5, Apr (2004), 361--397.
[30]
Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 165--172.
[31]
Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.
[32]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[33]
Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Interspeech, Vol. 2. 3.
[34]
Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, and Johannes Fürnkranz. 2014. Large-scale multi-label text classification fire visiting neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 437--452.
[35]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP, Vol. 14. 1532--1543.
[36]
Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 263--272.
[37]
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), Vol. 1631. Citeseer, 1642.
[38]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[39]
Farbound Tai and Hsuan-Tien Lin. 2012. Multilabel classification with principal label space transformation. Neural Computation 24, 9 (2012), 2508--2542.
[40]
Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling up to large vocabulary image annotation. (2011).
[41]
Jason Weston, Ameesh Makadia, and Hector Yee. 2013. Label Partitioning For Sublinear Ranking. In ICML (2). 181--189.
[42]
Yiming Yang and Siddharth Gopal. 2012. Multilabel classification with metalevel features in a learning-to-rank framework. Machine Learning 88, 1--2 (2012), 47--68.
[43]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[44]
Chih-Kuan Yeh, Wei-Chieh Wu, Wei-Jen Ko, and Yu-Chiang Frank Wang. 2017. Learning Deep Latent Spaces for Multi-Label Classification. (2017).
[45]
Ian E. H. Yen, Xiangru Huang, Kai Zhong, Pradeep Ravikumar, and Inderjit S. Dhillon. 2016. PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. (2016).
[46]
Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit S. Dhillon. 2014. Largescale Multi-label Learning with Missing Labels. In Proceedings of the 31th International Conference on Machine Learning. 593--601.
[47]
Min-Ling Zhang and Zhi-Hua Zhou. 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE transactions on Knowledge and Data Engineering 18, 10 (2006), 1338--1351.
[48]
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038--2048.
[49]
Rui Zhang, Honglak Lee, and Dragomir Radev. 2016. Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv preprint arXiv:1611.02361 (2016).
[50]
Yi Zhang and Jeff G. Schneider. 2011. Multi-Label Output Codes using Canonical Correlation Analysis. In AISTATS. 873--882.
[51]
Arkaitz Zubiaga. 2012. Enhancing navigation on wikipedia with social tags. arXiv preprint arXiv:1202.5469 (2012).

Cited By

View all
  • (2024)RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention NetworkSensors10.3390/s2411336524:11(3365)Online publication date: 24-May-2024
  • (2024)Determining Optimal Assembly Condition for Lens Module Production by Combining Genetic Algorithm and C-BLSTMProcesses10.3390/pr1203045212:3(452)Online publication date: 23-Feb-2024
  • (2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural network
  2. deep learning
  3. extreme text classification
  4. multi-label

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '17
Sponsor:

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,803
  • Downloads (Last 6 weeks)256
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention NetworkSensors10.3390/s2411336524:11(3365)Online publication date: 24-May-2024
  • (2024)Determining Optimal Assembly Condition for Lens Module Production by Combining Genetic Algorithm and C-BLSTMProcesses10.3390/pr1203045212:3(452)Online publication date: 23-Feb-2024
  • (2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
  • (2024)Contrastive Enhanced Learning for Multi-Label Text ClassificationApplied Sciences10.3390/app1419865014:19(8650)Online publication date: 25-Sep-2024
  • (2024)JudPriNet: Video Transition Detection Based on Semantic Relationship and Monte Carlo SamplingIntelligent and Converged Networks10.23919/ICN.2024.00105:2(134-146)Online publication date: Jun-2024
  • (2024)Artificial Intelligence, the Digital Surgeon: Unravelling Its Emerging Footprint in Healthcare – The Narrative ReviewJournal of Multidisciplinary Healthcare10.2147/JMDH.S482757Volume 17(4011-4022)Online publication date: Aug-2024
  • (2024)Multimodal Attentive Representation Learning for Micro-video Multi-label ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364388820:6(1-23)Online publication date: 8-Mar-2024
  • (2024)On the Value of Head Labels in Multi-Label Text ClassificationACM Transactions on Knowledge Discovery from Data10.1145/364385318:5(1-21)Online publication date: 26-Mar-2024
  • (2024)Performance Analysis of Tweet Summarization Techniques Considering Crisis DynamicsProceedings of the 25th International Conference on Distributed Computing and Networking10.1145/3631461.3631951(418-423)Online publication date: 4-Jan-2024
  • (2024)EVT: Accelerating Deep Learning Training with Epilogue Visitor TreeProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651369(301-316)Online publication date: 27-Apr-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media