research-article

On the Value of Head Labels in Multi-Label Text Classification

Authors:

Gang ChenAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 5

Article No.: 124, Pages 1 - 21

https://doi.org/10.1145/3643853

Published: 26 March 2024 Publication History

Abstract

A formidable challenge in the multi-label text classification (MLTC) context is that the labels often exhibit a long-tailed distribution, which typically prevents deep MLTC models from obtaining satisfactory performance. To alleviate this problem, most existing solutions attempt to improve tail performance by means of sampling or introducing extra knowledge. Data-rich labels, though more trustworthy, have not received the attention they deserve. In this work, we propose a multiple-stage training framework to exploit both model- and feature-level knowledge from the head labels, to improve both the representation and generalization ability of MLTC models. Moreover, we theoretically prove the superiority of our framework design over other alternatives. Comprehensive experiments on widely used MLTC datasets clearly demonstrate that the proposed framework achieves highly superior results to state-of-the-art methods, highlighting the value of head labels in MLTC.

References

[1]

Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc-Pietro, and Vasileios Lampos. 2016. Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Comput. Sci. 2 (2016), e93.

[2]

Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. 2019. A convergence theory for deep learning via over-parameterization. In Proceedings of the ICML. Vol. 97, PMLR, 242–252.

[3]

Rohit Babbar and Bernhard Schölkopf. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the WSDM. ACM, 721–729.

Digital Library

[4]

Rohit Babbar and Bernhard Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Mach. Learn. 108, 8–9 (2019), 1329–1351.

Digital Library

[5]

K. Bhatia, K. Dahiya, H. Jain, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. Retrieved from http://manikvarma.org/downloads/XC/XMLRepository.html. Accessed 1-1-2024.

[6]

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the NeurIPS. 730–738.

[7]

Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2020. An empirical study on large-scale multi-label text classification including few and zero-shot labels. In Proceedings of the EMNLP. Association for Computational Linguistics, 7503–7515.

[8]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S. Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the SIGKDD. ACM, 3163–3171.

Digital Library

[9]

Boli Chen, Xin Huang, Lin Xiao, and Liping Jing. 2020. Hyperbolic capsule networks for multi-label classification. In Proceedings of the ACL. Association for Computational Linguistics, 3115–3124.

[10]

Yao-Nan Chen and Hsuan-Tien Lin. 2012. Feature-aware label space dimension reduction for multi-label classification. In Proceedings of the NeurIPS. 1538–1546.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT. Association for Computational Linguistics, 4171–4186.

[12]

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. In Proceedings of the UAI. AUAI Press, 876–885.

[13]

Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the KDD. ACM, 935–944.

Digital Library

[14]

Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the AAAI. AAAI Press, 7987–7994.

[15]

L. Jing and Y. Tian. 2020. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 11 (2021), 4037–4058. DOI:.

[16]

Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: Diverse and shallow trees for extreme multi-label classification. Mach. Learn. 109, 11 (2020), 2099–2119.

Digital Library

[17]

Yuncong Li, Cunxiang Yin, Sheng-hua Zhong, and Xu Pan. 2020. Multi-instance multi-label learning networks for aspect-category sentiment analysis. In Proceedings of the EMNLP. Association for Computational Linguistics, 3550–3560.

[18]

Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the SIGIR. ACM, 115–124.

Digital Library

[19]

Weiwei Liu and Xiaobo Shen. 2019. Sparse extreme multi-label learning with oracle property. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 97, PMLR, 4032–4041.

[20]

Yixin Liu and Pengfei Liu. 2021. SimCLS: A simple framework for contrastive learning of abstractive summarization. In Proceedings of the ACL/IJCNLP. Association for Computational Linguistics, 1065–1072.

[21]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arxiv:1907.11692. Retrieved from http://arxiv.org/abs/1907.11692

[22]

Andreas Maurer. 2016. A vector-contraction inequality for rademacher complexities. In Proceedings of the ALT. Vol. 9925, 3–17.

Digital Library

[23]

Andreas Maurer, Massimiliano Pontil, and Bernardino Romera-Paredes. 2016. The benefit of multitask representation learning. J. Mach. Learn. Res. 17, 1 (2016), 2853–2884.

[24]

Julian J. McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the RecSys. ACM, 165–172.

Digital Library

[25]

Eneldo Loza Mencía and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proceedings of the ECML/PKDD . Lecture Notes in Computer Science, Vol. 5212, Springer, 50–65.

Digital Library

[26]

James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, and Jacob Eisenstein. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the NAACL-HLT. Association for Computational Linguistics, 1101–1111.

[27]

Jinseok Nam, Eneldo Loza Mencía, Hyunwoo J. Kim, and Johannes Fürnkranz. 2017. Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Proceedings of the NeurIPS. 5413–5423.

[28]

Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345–1359.

Digital Library

[29]

Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the WWW. ACM, 993–1002.

Digital Library

[30]

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2021. Classifier chains: A review and perspectives. J. Artif. Intell. Res. 70 (2021), 683–718.

Digital Library

[31]

Anthony Rios and Ramakanth Kavuluru. 2018. Few-shot and zero-shot multi-label learning for structured label spaces. In Proceedings of the EMNLP. Association for Computational Linguistics, 3132–3142.

[32]

Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar. 2021. Convex surrogates for unbiased loss functions in extreme classification with missing labels. WWW’21: The Web Conference 2021, Virtual Event/Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM/IW3C2, 3711–3720.

Digital Library

[33]

Yukihiro Tagami. 2017. AnnexML: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the SIGKDD. ACM, 455–464.

Digital Library

[34]

Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, and William Yang Wang. 2019. Self-supervised learning for contextualized extractive summarization. In Proceedings of the ACL. Association for Computational Linguistics, 2221–2227.

[35]

Tong Wei and Yu-Feng Li. 2020. Does tail label help for large-scale multi-label learning? IEEE Trans. Neural Networks Learn. Syst. 31, 7 (2020), 2315–2324.

[36]

Tong Wei, Wei-Wei Tu, Yu-Feng Li, and Guo-Ping Yang. 2021. Towards robust prediction on tail labels. In Proceedings of the SIGKDD. ACM, 1812–1820.

Digital Library

[37]

Alexander Wettig, Tianyu Gao, Zexuan Zhong, and Danqi Chen. 2023. Should you mask 15% in masked language modeling?. In Proceedings of the EACL. Association for Computational Linguistics, 2977–2992.

[38]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace’s transformers: State-of-the-art natural language processing. arxiv:1910.03771. Retrieved from http://arxiv.org/abs/1910.03771

[39]

Chang Xu, Dacheng Tao, and Chao Xu. 2016. Robust extreme multi-label learning. In Proceedings of the SIGKDD. ACM, 1275–1284.

Digital Library

[40]

Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence generation model for multi-label classification. In Proceedings of the COLING. Association for Computational Linguistics, 3915–3926.

[41]

Yuzhe Yang and Zhi Xu. 2020. Rethinking the value of labels for improving class-imbalanced learning. In Proceedings of the NeurIPS.

[42]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the NeurIPS. 5754–5764.

[43]

Hui Ye, Zhiyu Chen, Da-Han Wang, and Brian D. Davison. 2020. Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 119, PMLR, 10809–10819.

[44]

Ian En-Hsu Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit S. Dhillon, and Eric P. Xing. 2017. PPDsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the SIGKDD. ACM, 545–553.

Digital Library

[45]

Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Proceedings of the NeurIPS. 5812–5822.

[46]

Arkaitz Zubiaga. 2012. Enhancing navigation on Wikipedia with social tags. arxiv:1202.5469. Retrieved from http://arxiv.org/abs/1202.5469

Index Terms

On the Value of Head Labels in Multi-Label Text Classification
1. Computing methodologies
  1. Machine learning

Recommendations

Multi-Label Text Classification Model Based on Multi-Level Constraint Augmentation and Label Association Attention
In the multi-label text classification task, a text usually corresponds to multiple label categories, and the labels have correlation and hierarchical structure. However, when the label hierarchy is unknown, the number of various labels is not balanced, ...
Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification
WWW '22: Proceedings of the ACM Web Conference 2022

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer ...
Multi-label Text Classification with Label Correction under Noise
ICCPR '21: Proceedings of the 2021 10th International Conference on Computing and Pattern Recognition

Multi-label text classification (MLTC) is a fundamental but difficult problem in text mining, the goal of MLTC is to assign a set of most relevant labels for the given document. While existing supervised training of deep learning models for MLTC ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 5

June 2024

699 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3613659

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024

Online AM: 05 February 2024

Accepted: 24 January 2024

Revised: 14 December 2023

Received: 26 May 2022

Published in TKDD Volume 18, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Pioneer R&D Program of Zhejiang

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
223
Total Downloads

Downloads (Last 12 months)223
Downloads (Last 6 weeks)20

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents