Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Dual-branch Learning Model with Gradient-balanced Loss for Long-tailed Multi-label Text Classification

Published: 27 September 2023 Publication History

Abstract

Multi-label text classification has a wide range of applications in the real world. However, the data distribution in the real world is often imbalanced, which leads to serious long-tailed problems. For multi-label classification, due to the vast scale of datasets and existence of label co-occurrence, how to effectively improve the prediction accuracy of tail labels without degrading the overall precision becomes an important challenge. To address this issue, we propose A Dual-Branch Learning Model with Gradient-Balanced Loss (DBGB) based on the paradigm of existing pre-trained multi-label classification SOTA models. Our model consists of two main long-tailed module improvements. First, with the shared text representation, the dual-classifier is leveraged to process two kinds of label distributions; one is the original data distribution and the other is the under-sampling distribution for head labels to strengthen the prediction for tail labels. Second, the proposed gradient-balanced loss can adaptively suppress the negative gradient accumulation problem related to labels, especially tail labels. We perform extensive experiments on three multi-label text classification datasets. The results show that the proposed method achieves competitive performance on overall prediction results compared to the state-of-the-art methods in solving the multi-label classification, with significant improvement on tail-label accuracy.

References

[1]
Rohit Babbar and Bernhard Schölkopf. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 721–729.
[2]
Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, and Inderjit S. Dhillon. 2021. Enabling efficiency-precision trade-offs for label trees in extreme classification. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3717–3726.
[3]
K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. Retrieved from http://manikvarma.org/downloads/XC/XMLRepository.html
[4]
Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vol. 29. 730–738.
[5]
Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Aréchiga, and Tengyu Ma. 2019. Learning imbalanced datasets with label-distribution-aware margin loss. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vol. 32. 1565–1576.
[6]
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit Dhillon. 2019. X-BERT: Extreme multi-label text classification with using bidirectional encoder representations from transformers. arXiv preprint arXiv:1905.02331 (2019).
[7]
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S. Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3163–3171.
[8]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9268–9277.
[9]
Kunal Dahiya, Deepak Saini, Anshul Mittal, Ankush Shaw, Kushal Dave, Akshay Soni, Himanshu Jain, Sumeet Agarwal, and Manik Varma. 2021. DeepXML: A deep extreme multi-label learning framework applied to short text documents. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 31–39.
[10]
Graziella De Martino, Gianvito Pio, and Michelangelo Ceci. 2022. PRILJ: An efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments. Artif. Intell. Law 30, 3 (Sep.2022), 359–390. DOI:
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[12]
Kevin Duarte, Yogesh Rawat, and Mubarak Shah. 2021. PLM: Partial Label Masking for imbalanced multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2739–2748.
[13]
Zhiqi Ge and Ximing Li. 2021. To Be or Not to Be, Tail Labels in Extreme Multi-label Learning. Association for Computing Machinery, New York, NY, 555–564. DOI:
[14]
Hao Guo and Song Wang. 2021. Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15089–15098.
[15]
Yi Huang, Buse Giledereli, Abdullatif Köksal, Arzucan Özgür, and Elif Ozkirimli. 2021. Balancing methods for multi-label text classification with long-tailed class distribution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8153–8161.
[16]
Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).
[17]
Himanshu Jain, Venkatesh Balasubramanian, Bhanu Chunduri, and Manik Varma. 2019. Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 528–536.
[18]
Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935–944.
[19]
Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI Press, 7987–7994.
[20]
Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2020. Decoupling representation and classifier for long-tailed recognition. In Proceedings of the 8th International Conference on Learning Representations. OpenReview.net.
[21]
Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: Diverse and shallow trees for extreme multi-label classification. Mach. Learn. 109, 11 (2020), 2099–2119.
[22]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[23]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.
[24]
Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115–124.
[25]
Siyi Liu and Yujia Zheng. 2020. Long-tail session-based recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems.
[26]
Weiwei Liu, Haobo Wang, Xiaobo Shen, and Ivor Tsang. 2021. The emerging trends of multi-label learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2021), 795.
[27]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[28]
Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems. 165–172.
[29]
Robert W. Mee and Tin Chiu Chua. 1991. Regression toward the mean and the paired sample t test. Am. Stat. 45, 1 (1991), 39–42.
[30]
Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50–65.
[31]
Anshul Mittal, Noveen Sachdeva, Sheshansh Agrawal, Sumeet Agarwal, Purushottam Kar, and Manik Varma. 2021. ECLARE: Extreme classification with label graph correlations. In Proceedings of the Web Conference. 3721–3732.
[32]
Yashoteja Prabhu, Anil Kag, Shilpa Gopinath, Kunal Dahiya, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Extreme multi-label learning with label features for warm-start tagging, ranking & recommendation. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 441–449.
[33]
Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the World Wide Web Conference. 993–1002.
[34]
Yashoteja Prabhu and Manik Varma. 2014. FastXML: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 263–272.
[35]
Mohammadreza Qaraei and Rohit Babbar. 2022. Adversarial examples for extreme multilabel text classification. Mach. Learn. 111, 12 (Dec.2022), 4539–4563. DOI:
[36]
Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar. 2021. Convex surrogates for unbiased loss functions in extreme classification with missing labels. In Proceedings of the Web Conference.
[37]
Robert M. Sanders. 1987. The Pareto principle: Its use and abuse. J. Serv. Market. 1 (1987), 37–40.
[38]
Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, and Junjie Yan. 2020. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11662–11671.
[39]
Peng Wang, Kai Han, Xiu-Shen Wei, Lei Zhang, and Lei Wang. 2021. Contrastive learning based hybrid networks for long-tailed image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 943–952.
[40]
Tong Wang, Yousong Zhu, Chaoyang Zhao, Wei Zeng, Jinqiao Wang, and Ming Tang. 2021. Adaptive class suppression loss for long-tail object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3103–3112.
[41]
Tong Wei and Yu-Feng Li. 2018. Does tail label help for large-scale multi-label learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2847–2853. DOI:
[42]
Tong Wei, Wei-Wei Tu, Yu-Feng Li, and Guo-Ping Yang. 2021. Towards robust prediction on tail labels. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1812–1820.
[43]
Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, and Dahua Lin. 2020. Distribution-balanced loss for multi-label classification in long-tailed datasets. In Proceedings of the European Conference on Computer Vision. Springer, 162–178.
[44]
Lin Xiao, Xiangliang Zhang, Liping Jing, Chi Huang, and Mingyang Song. 2021. Does head label help for long-tailed multi-label text classification. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI Press, 14103–14111.
[45]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019).
[46]
Hui Ye, Zhiyu Chen, Da-Han Wang, and Brian D. Davison. 2020. Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification. In Proceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org.
[47]
Ian E. H. Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, and Eric Xing. 2017. PPDsparse: A Parallel Primal-Dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 545–553.
[48]
Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, and Inderjit Dhillon. 2016. PD-sparse: A Primal and Dual sparse approach to extreme multiclass and multilabel classification. In Proceedings of the International Conference on Machine Learning. PMLR, 3069–3077.
[49]
Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5812–5822.
[50]
Mengqi Yuan, Jinke Xu, and Zhongnian Li. 2019. Long tail multi-label learning. In Proceedings of the IEEE 2nd International Conference on Artificial Intelligence and Knowledge Engineering (AIKE’19). 28–31. DOI:
[51]
Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. 2021. Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596 (2021).
[52]
Boyan Zhou, Quan Cui, Xiu-Shen Wei, and Zhao-Min Chen. 2020. BBN: Bilateral-Branch Network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9719–9728.
[53]
Arkaitz Zubiaga. 2012. Enhancing navigation on Wikipedia with social tags. arXiv preprint arXiv:1202.5469 (2012).

Cited By

View all
  • (2024)Label-text bi-attention capsule networks model for multi-label text classificationNeurocomputing10.1016/j.neucom.2024.127671588:COnline publication date: 17-Jul-2024

Index Terms

  1. A Dual-branch Learning Model with Gradient-balanced Loss for Long-tailed Multi-label Text Classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 42, Issue 2
      March 2024
      897 pages
      EISSN:1558-2868
      DOI:10.1145/3618075
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 September 2023
      Online AM: 04 August 2023
      Accepted: 28 April 2023
      Revised: 26 February 2023
      Received: 12 May 2022
      Published in TOIS Volume 42, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-label text classification
      2. long-tailed learning
      3. dual-branch structure
      4. re-weighting loss function

      Qualifiers

      • Research-article

      Funding Sources

      • Natural Science Foundation of China
      • TJU-Wenge joint laboratory funding, Tianjin Research Innovation Project for Postgraduate Students

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)500
      • Downloads (Last 6 weeks)39
      Reflects downloads up to 10 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Label-text bi-attention capsule networks model for multi-label text classificationNeurocomputing10.1016/j.neucom.2024.127671588:COnline publication date: 17-Jul-2024

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media