Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On the Value of Head Labels in Multi-Label Text Classification

Published: 26 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    A formidable challenge in the multi-label text classification (MLTC) context is that the labels often exhibit a long-tailed distribution, which typically prevents deep MLTC models from obtaining satisfactory performance. To alleviate this problem, most existing solutions attempt to improve tail performance by means of sampling or introducing extra knowledge. Data-rich labels, though more trustworthy, have not received the attention they deserve. In this work, we propose a multiple-stage training framework to exploit both model- and feature-level knowledge from the head labels, to improve both the representation and generalization ability of MLTC models. Moreover, we theoretically prove the superiority of our framework design over other alternatives. Comprehensive experiments on widely used MLTC datasets clearly demonstrate that the proposed framework achieves highly superior results to state-of-the-art methods, highlighting the value of head labels in MLTC.

    References

    [1]
    Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc-Pietro, and Vasileios Lampos. 2016. Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Comput. Sci. 2 (2016), e93.
    [2]
    Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. 2019. A convergence theory for deep learning via over-parameterization. In Proceedings of the ICML. Vol. 97, PMLR, 242–252.
    [3]
    Rohit Babbar and Bernhard Schölkopf. 2017. DiSMEC: Distributed sparse machines for extreme multi-label classification. In Proceedings of the WSDM. ACM, 721–729.
    [4]
    Rohit Babbar and Bernhard Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Mach. Learn. 108, 8–9 (2019), 1329–1351.
    [5]
    K. Bhatia, K. Dahiya, H. Jain, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. Retrieved from http://manikvarma.org/downloads/XC/XMLRepository.html. Accessed 1-1-2024.
    [6]
    Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the NeurIPS. 730–738.
    [7]
    Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2020. An empirical study on large-scale multi-label text classification including few and zero-shot labels. In Proceedings of the EMNLP. Association for Computational Linguistics, 7503–7515.
    [8]
    Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S. Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the SIGKDD. ACM, 3163–3171.
    [9]
    Boli Chen, Xin Huang, Lin Xiao, and Liping Jing. 2020. Hyperbolic capsule networks for multi-label classification. In Proceedings of the ACL. Association for Computational Linguistics, 3115–3124.
    [10]
    Yao-Nan Chen and Hsuan-Tien Lin. 2012. Feature-aware label space dimension reduction for multi-label classification. In Proceedings of the NeurIPS. 1538–1546.
    [11]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT. Association for Computational Linguistics, 4171–4186.
    [12]
    Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. In Proceedings of the UAI. AUAI Press, 876–885.
    [13]
    Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the KDD. ACM, 935–944.
    [14]
    Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the AAAI. AAAI Press, 7987–7994.
    [15]
    L. Jing and Y. Tian. 2020. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 11 (2021), 4037–4058. DOI:.
    [16]
    Sujay Khandagale, Han Xiao, and Rohit Babbar. 2020. Bonsai: Diverse and shallow trees for extreme multi-label classification. Mach. Learn. 109, 11 (2020), 2099–2119.
    [17]
    Yuncong Li, Cunxiang Yin, Sheng-hua Zhong, and Xu Pan. 2020. Multi-instance multi-label learning networks for aspect-category sentiment analysis. In Proceedings of the EMNLP. Association for Computational Linguistics, 3550–3560.
    [18]
    Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the SIGIR. ACM, 115–124.
    [19]
    Weiwei Liu and Xiaobo Shen. 2019. Sparse extreme multi-label learning with oracle property. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 97, PMLR, 4032–4041.
    [20]
    Yixin Liu and Pengfei Liu. 2021. SimCLS: A simple framework for contrastive learning of abstractive summarization. In Proceedings of the ACL/IJCNLP. Association for Computational Linguistics, 1065–1072.
    [21]
    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arxiv:1907.11692. Retrieved from http://arxiv.org/abs/1907.11692
    [22]
    Andreas Maurer. 2016. A vector-contraction inequality for rademacher complexities. In Proceedings of the ALT. Vol. 9925, 3–17.
    [23]
    Andreas Maurer, Massimiliano Pontil, and Bernardino Romera-Paredes. 2016. The benefit of multitask representation learning. J. Mach. Learn. Res. 17, 1 (2016), 2853–2884.
    [24]
    Julian J. McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the RecSys. ACM, 165–172.
    [25]
    Eneldo Loza Mencía and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Proceedings of the ECML/PKDD . Lecture Notes in Computer Science, Vol. 5212, Springer, 50–65.
    [26]
    James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, and Jacob Eisenstein. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the NAACL-HLT. Association for Computational Linguistics, 1101–1111.
    [27]
    Jinseok Nam, Eneldo Loza Mencía, Hyunwoo J. Kim, and Johannes Fürnkranz. 2017. Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Proceedings of the NeurIPS. 5413–5423.
    [28]
    Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345–1359.
    [29]
    Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the WWW. ACM, 993–1002.
    [30]
    Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2021. Classifier chains: A review and perspectives. J. Artif. Intell. Res. 70 (2021), 683–718.
    [31]
    Anthony Rios and Ramakanth Kavuluru. 2018. Few-shot and zero-shot multi-label learning for structured label spaces. In Proceedings of the EMNLP. Association for Computational Linguistics, 3132–3142.
    [32]
    Mohammadreza Qaraei, Erik Schultheis, Priyanshu Gupta, and Rohit Babbar. 2021. Convex surrogates for unbiased loss functions in extreme classification with missing labels. WWW’21: The Web Conference 2021, Virtual Event/Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM/IW3C2, 3711–3720.
    [33]
    Yukihiro Tagami. 2017. AnnexML: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the SIGKDD. ACM, 455–464.
    [34]
    Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, and William Yang Wang. 2019. Self-supervised learning for contextualized extractive summarization. In Proceedings of the ACL. Association for Computational Linguistics, 2221–2227.
    [35]
    Tong Wei and Yu-Feng Li. 2020. Does tail label help for large-scale multi-label learning? IEEE Trans. Neural Networks Learn. Syst. 31, 7 (2020), 2315–2324.
    [36]
    Tong Wei, Wei-Wei Tu, Yu-Feng Li, and Guo-Ping Yang. 2021. Towards robust prediction on tail labels. In Proceedings of the SIGKDD. ACM, 1812–1820.
    [37]
    Alexander Wettig, Tianyu Gao, Zexuan Zhong, and Danqi Chen. 2023. Should you mask 15% in masked language modeling?. In Proceedings of the EACL. Association for Computational Linguistics, 2977–2992.
    [38]
    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace’s transformers: State-of-the-art natural language processing. arxiv:1910.03771. Retrieved from http://arxiv.org/abs/1910.03771
    [39]
    Chang Xu, Dacheng Tao, and Chao Xu. 2016. Robust extreme multi-label learning. In Proceedings of the SIGKDD. ACM, 1275–1284.
    [40]
    Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence generation model for multi-label classification. In Proceedings of the COLING. Association for Computational Linguistics, 3915–3926.
    [41]
    Yuzhe Yang and Zhi Xu. 2020. Rethinking the value of labels for improving class-imbalanced learning. In Proceedings of the NeurIPS.
    [42]
    Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the NeurIPS. 5754–5764.
    [43]
    Hui Ye, Zhiyu Chen, Da-Han Wang, and Brian D. Davison. 2020. Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 119, PMLR, 10809–10819.
    [44]
    Ian En-Hsu Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit S. Dhillon, and Eric P. Xing. 2017. PPDsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the SIGKDD. ACM, 545–553.
    [45]
    Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Proceedings of the NeurIPS. 5812–5822.
    [46]
    Arkaitz Zubiaga. 2012. Enhancing navigation on Wikipedia with social tags. arxiv:1202.5469. Retrieved from http://arxiv.org/abs/1202.5469

    Index Terms

    1. On the Value of Head Labels in Multi-Label Text Classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 5
      June 2024
      699 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3613659
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 March 2024
      Online AM: 05 February 2024
      Accepted: 24 January 2024
      Revised: 14 December 2023
      Received: 26 May 2022
      Published in TKDD Volume 18, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-label text classification
      2. long-tail
      3. self-supervised learning

      Qualifiers

      • Research-article

      Funding Sources

      • Pioneer R&D Program of Zhejiang

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 223
        Total Downloads
      • Downloads (Last 12 months)223
      • Downloads (Last 6 weeks)20
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media