research-article

A Dual-branch Learning Model with Gradient-balanced Loss for Long-tailed Multi-label Text Classification

Authors:

Yueheng SunAuthors Info & Claims

ACM Transactions on Information Systems, Volume 42, Issue 2

Article No.: 34, Pages 1 - 24

https://doi.org/10.1145/3597416

Published: 27 September 2023 Publication History

Abstract

Multi-label text classification has a wide range of applications in the real world. However, the data distribution in the real world is often imbalanced, which leads to serious long-tailed problems. For multi-label classification, due to the vast scale of datasets and existence of label co-occurrence, how to effectively improve the prediction accuracy of tail labels without degrading the overall precision becomes an important challenge. To address this issue, we propose A Dual-Branch Learning Model with Gradient-Balanced Loss (DBGB) based on the paradigm of existing pre-trained multi-label classification SOTA models. Our model consists of two main long-tailed module improvements. First, with the shared text representation, the dual-classifier is leveraged to process two kinds of label distributions; one is the original data distribution and the other is the under-sampling distribution for head labels to strengthen the prediction for tail labels. Second, the proposed gradient-balanced loss can adaptively suppress the negative gradient accumulation problem related to labels, especially tail labels. We perform extensive experiments on three multi-label text classification datasets. The results show that the proposed method achieves competitive performance on overall prediction results compared to the state-of-the-art methods in solving the multi-label classification, with significant improvement on tail-label accuracy.

References

[1]

Rohit Babbar and Bernhard Schölkopf. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 721–729.

Digital Library

[2]

Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, and Inderjit S. Dhillon. 2021. Enabling efficiency-precision trade-offs for label trees in extreme classification. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3717–3726.

Digital Library

[3]

K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. Retrieved from http://manikvarma.org/downloads/XC/XMLRepository.html

[4]

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vol. 29. 730–738.

[5]

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Aréchiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vol. 32. 1565–1576.

[6]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit Dhillon. 2019. X-BERT: Extreme multi-label text classification with using bidirectional encoder representations from transformers. arXiv preprint arXiv:1905.02331 (2019).

[7]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S. Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3163–3171.

Digital Library

[8]

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9268–9277.

[9]

Kunal Dahiya, Deepak Saini, Anshul Mittal, Ankush Shaw, Kushal Dave, Akshay Soni, Himanshu Jain, Sumeet Agarwal, and Manik Varma. 2021. DeepXML: A deep extreme multi-label learning framework applied to short text documents. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 31–39.

Digital Library

[10]

Graziella De Martino, Gianvito Pio, and Michelangelo Ceci. 2022. PRILJ: An efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments. Artif. Intell. Law 30, 3 (Sep.2022), 359–390. DOI:

Digital Library

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[12]

Kevin Duarte, Yogesh Rawat, and Mubarak Shah. 2021. PLM: Partial Label Masking for imbalanced multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2739–2748.

[13]

Zhiqi Ge and Ximing Li. 2021. To Be or Not to Be, Tail Labels in Extreme Multi-label Learning. Association for Computing Machinery, New York, NY, 555–564. DOI:

Digital Library

[14]

Hao Guo and Song Wang. 2021. Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15089–15098.

[15]

Yi Huang, Buse Giledereli, Abdullatif Köksal, Arzucan Özgür, and Elif Ozkirimli. 2021. Balancing methods for multi-label text classification with long-tailed class distribution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8153–8161.

[16]

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).

[17]

Himanshu Jain, Venkatesh Balasubramanian, Bhanu Chunduri, and Manik Varma. 2019. Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 528–536.

Digital Library

[18]

Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935–944.

Digital Library

[19]

Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI Press, 7987–7994.

[20]

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2020. Decoupling representation and classifier for long-tailed recognition. In Proceedings of the 8th International Conference on Learning Representations. OpenReview.net.

[21]

Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: Diverse and shallow trees for extreme multi-label classification. Mach. Learn. 109, 11 (2020), 2099–2119.

Digital Library

[22]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[23]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.

[24]

Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115–124.

Digital Library

[25]

Siyi Liu and Yujia Zheng. 2020. Long-tail session-based recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems.

Digital Library

[26]

Weiwei Liu, Haobo Wang, Xiaobo Shen, and Ivor Tsang. 2021. The emerging trends of multi-label learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2021), 795.

[27]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[28]

Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems. 165–172.

Digital Library

[29]

Robert W. Mee and Tin Chiu Chua. 1991. Regression toward the mean and the paired sample t test. Am. Stat. 45, 1 (1991), 39–42.

[30]

Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50–65.

Digital Library

[31]

Anshul Mittal, Noveen Sachdeva, Sheshansh Agrawal, Sumeet Agarwal, Purushottam Kar, and Manik Varma. 2021. ECLARE: Extreme classification with label graph correlations. In Proceedings of the Web Conference. 3721–3732.

Digital Library

[32]

Yashoteja Prabhu, Anil Kag, Shilpa Gopinath, Kunal Dahiya, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Extreme multi-label learning with label features for warm-start tagging, ranking & recommendation. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 441–449.

Digital Library

[33]

Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the World Wide Web Conference. 993–1002.

Digital Library

[34]

Yashoteja Prabhu and Manik Varma. 2014. FastXML: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 263–272.

Digital Library

[35]

Mohammadreza Qaraei and Rohit Babbar. 2022. Adversarial examples for extreme multilabel text classification. Mach. Learn. 111, 12 (Dec.2022), 4539–4563. DOI:

Digital Library

[36]

Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar. 2021. Convex surrogates for unbiased loss functions in extreme classification with missing labels. In Proceedings of the Web Conference.

[37]

Robert M. Sanders. 1987. The Pareto principle: Its use and abuse. J. Serv. Market. 1 (1987), 37–40.

[38]

Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. 2020. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11662–11671.

[39]

Peng Wang, Kai Han, Xiu-Shen Wei, Lei Zhang, and Lei Wang. 2021. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 943–952.

[40]

Tong Wang, Yousong Zhu, Chaoyang Zhao, Wei Zeng, Jinqiao Wang, and Ming Tang. 2021. Adaptive class suppression loss for long-tail object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3103–3112.

[41]

Tong Wei and Yu-Feng Li. 2018. Does tail label help for large-scale multi-label learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2847–2853. DOI:

[42]

Tong Wei, Wei-Wei Tu, Yu-Feng Li, and Guo-Ping Yang. 2021. Towards robust prediction on tail labels. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1812–1820.

Digital Library

[43]

Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, and Dahua Lin. 2020. Distribution-balanced loss for multi-label classification in long-tailed datasets. In Proceedings of the European Conference on Computer Vision. Springer, 162–178.

Digital Library

[44]

Lin Xiao, Xiangliang Zhang, Liping Jing, Chi Huang, and Mingyang Song. 2021. Does head label help for long-tailed multi-label text classification. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI Press, 14103–14111.

[45]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019).

[46]

Hui Ye, Zhiyu Chen, Da-Han Wang, and Brian D. Davison. 2020. Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification. In Proceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org.

Digital Library

[47]

Ian E. H. Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, and Eric Xing. 2017. PPDsparse: A Parallel Primal-Dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 545–553.

Digital Library

[48]

Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon. 2016. PD-sparse: A Primal and Dual sparse approach to extreme multiclass and multilabel classification. In Proceedings of the International Conference on Machine Learning. PMLR, 3069–3077.

[49]

Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5812–5822.

[50]

Mengqi Yuan, Jinke Xu, and Zhongnian Li. 2019. Long tail multi-label learning. In Proceedings of the IEEE 2nd International Conference on Artificial Intelligence and Knowledge Engineering (AIKE’19). 28–31. DOI:

[51]

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. 2021. Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596 (2021).

[52]

Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. 2020. BBN: Bilateral-Branch Network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9719–9728.

[53]

Arkaitz Zubiaga. 2012. Enhancing navigation on Wikipedia with social tags. arXiv preprint arXiv:1202.5469 (2012).

Cited By

Wang GDu YJiang YLiu JLi XChen XGao HXie CLee Y(2024)Label-text bi-attention capsule networks model for multi-label text classificationNeurocomputing10.1016/j.neucom.2024.127671588:COnline publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127671

Index Terms

A Dual-branch Learning Model with Gradient-balanced Loss for Long-tailed Multi-label Text Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification
WWW '22: Proceedings of the ACM Web Conference 2022

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer ...
Multi-label Text Classification with Label Correction under Noise
ICCPR '21: Proceedings of the 2021 10th International Conference on Computing and Pattern Recognition

Multi-label text classification (MLTC) is a fundamental but difficult problem in text mining, the goal of MLTC is to assign a set of most relevant labels for the given document. While existing supervised training of deep learning models for MLTC ...
Probability guided loss for long-tailed multi-label image classification
AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

Long-tailed learning has attracted increasing attention in very recent years. Long-tailed multi-label image classification is one subtask and remains challenging and poorly researched. In this paper, we provide a fresh perspective from probability to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 42, Issue 2

March 2024

897 pages

EISSN:1558-2868

DOI:10.1145/3618075

Editor:
Min Zhang
Tsinghua University, China

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2023

Online AM: 04 August 2023

Accepted: 28 April 2023

Revised: 26 February 2023

Received: 12 May 2022

Published in TOIS Volume 42, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of China
TJU-Wenge joint laboratory funding, Tianjin Research Innovation Project for Postgraduate Students

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
587
Total Downloads

Downloads (Last 12 months)500
Downloads (Last 6 weeks)39

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang GDu YJiang YLiu JLi XChen XGao HXie CLee Y(2024)Label-text bi-attention capsule networks model for multi-label text classificationNeurocomputing10.1016/j.neucom.2024.127671588:COnline publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127671

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents