Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

  • 1855 Accesses

Abstract

Data labeling for multi-label text is a challenging task in natural language processing, and active learning has emerged as a promising approach to reduce annotation effort while improving model performance. The primary challenge in multi-label active learning is to develop query strategies that can effectively select the most valuable unlabeled instances for annotation. Batch-mode active learning approaches, which select a batch of informative and diverse instances in each iteration, have been considered useful for improving annotation efficiency. However, challenges such as incomplete information ranking and high computational costs still hinder the progress of batch-mode methods. In this paper, we propose MCVIE, a novel batch-mode active learning method for multi-label text. MCVIE employs a two-stage active learning query strategy. Firstly, we combine two measures of prediction uncertainty and category vector inconsistency to calculate the basic information score for each example-label pair. Then, we use the Euclidean distance of text feature vectors to iteratively select diverse and informative example-label pairs for annotation. Experimental results on three benchmark datasets demonstrate that MCVIE outperforms other competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/hanhan1214/active-learning-mcvie.

  2. 2.

    https://huggingface.co/distilbert-base-uncased.

  3. 3.

    https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data.

  4. 4.

    https://github.com/strwberry-smggls/ActiveLearningTextClassification/tree/main/AL/datasets.

  5. 5.

    https://github.com/iliaschalkidis/lmtc-eurlex57k/tree/master/data/datasets.

  6. 6.

    https://huggingface.co/datasets/go_emotions.

  7. 7.

    This work was completed during the internship at China Mobile.

References

  1. Cherman, E.A., Papanikolaou, Y., Tsoumakas, G., Monard, M.C.: Multi-label active learning: key issues and a novel query strategy. Evol. Syst. 10, 63–78 (2019)

    Article  Google Scholar 

  2. Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: AAAI, vol. 5, pp. 746–751 (2005)

    Google Scholar 

  3. Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of Portuguese juridical documents. In: Pires, F.M., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 435–444. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24580-3_50

    Chapter  Google Scholar 

  4. Gui, X., Lu, X., Yu, G.: Cost-effective batch-mode multi-label active learning. Neurocomputing 463, 355–367 (2021)

    Article  Google Scholar 

  5. Li, X., Wang, L., Sung, E.: Multilabel SVM active learning for image classification. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 4, pp. 2207–2210. IEEE (2004)

    Google Scholar 

  6. Mujawar, S.S., Bhaladhare, P.R.: An aspect based multi-label sentiment analysis using improved BERT system. Int. J. Intell. Syst. Appl. Eng. 11(1s), 228–235 (2023)

    Google Scholar 

  7. Nadeem, M.I., et al.: SHO-CNN: a metaheuristic optimization of a convolutional neural network for multi-label news classification. Electronics 12(1), 113 (2022)

    Article  Google Scholar 

  8. Parvaneh, A., Abbasnejad, E., Teney, D., Haffari, G.R., Van Den Hengel, A., Shi, J.Q.: Active learning by feature mixing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12237–12246 (2022)

    Google Scholar 

  9. Rafi, M., Abid, F.: Learning local and global features for optimized multi-label text classification. In: 2022 International Arab Conference on Information Technology (ACIT), pp. 1–9. IEEE (2022)

    Google Scholar 

  10. Reyes, O., Morell, C., Ventura, S.: Effective active learning strategy for multi-label learning. Neurocomputing 273, 494–508 (2018)

    Article  Google Scholar 

  11. Reyes, O., Ventura, S.: Evolutionary strategy to perform batch-mode active learning on multi-label data. ACM Trans. Intell. Syst. Technol. (TIST) 9(4), 1–26 (2018)

    Article  Google Scholar 

  12. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  13. Sener, O., Savarese, S.: Active learning for convolutional neural networks: a Core-Set approach. arXiv preprint arXiv:1708.00489 (2017)

  14. Shui, C., Zhou, F., Gagné, C., Wang, B.: Deep active learning: unified and principled method for query and training. In: International Conference on Artificial Intelligence and Statistics, pp. 1308–1318. PMLR (2020)

    Google Scholar 

  15. Song, R., et al.: Label prompt for multi-label text classification. In: Applied Intelligence, pp. 1–15 (2022)

    Google Scholar 

  16. Wang, M., Feng, T., Shan, Z., Min, F.: Attribute and label distribution driven multi-label active learning. Appl. Intell. 52(10), 11131–11146 (2022)

    Article  Google Scholar 

  17. Wertz, L., Mirylenka, K., Kuhn, J., Bogojeska, J.: Investigating active learning sampling strategies for extreme multi label text classification. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4597–4605 (2022)

    Google Scholar 

  18. Wu, J., et al.: Multi-label active learning algorithms for image classification: overview and future promise. ACM Comput. Surv. (CSUR) 53(2), 1–35 (2020)

    Article  Google Scholar 

  19. Wu, K., Cai, D., He, X.: Multi-label active learning based on submodular functions. Neurocomputing 313, 436–442 (2018)

    Article  Google Scholar 

  20. Yang, B., Sun, J.T., Wang, T., Chen, Z.: Effective multi-label active learning for text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 917–926 (2009)

    Google Scholar 

  21. Zhang, M., Plank, B.: Cartography active learning. arXiv preprint arXiv:2109.04282 (2021)

  22. Zhang, X., Xu, J., Soh, C., Chen, L.: LA-HCN: label-based attention for hierarchical multi-label text classification neural network. Expert Syst. Appl. 187, 115922 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, X., Zhou, F., Wang, Q., Wang, Y., Wang, Y. (2023). MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44693-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44692-4

  • Online ISBN: 978-3-031-44693-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics