Abstract
Data labeling for multi-label text is a challenging task in natural language processing, and active learning has emerged as a promising approach to reduce annotation effort while improving model performance. The primary challenge in multi-label active learning is to develop query strategies that can effectively select the most valuable unlabeled instances for annotation. Batch-mode active learning approaches, which select a batch of informative and diverse instances in each iteration, have been considered useful for improving annotation efficiency. However, challenges such as incomplete information ranking and high computational costs still hinder the progress of batch-mode methods. In this paper, we propose MCVIE, a novel batch-mode active learning method for multi-label text. MCVIE employs a two-stage active learning query strategy. Firstly, we combine two measures of prediction uncertainty and category vector inconsistency to calculate the basic information score for each example-label pair. Then, we use the Euclidean distance of text feature vectors to iteratively select diverse and informative example-label pairs for annotation. Experimental results on three benchmark datasets demonstrate that MCVIE outperforms other competitive methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
This work was completed during the internship at China Mobile.
References
Cherman, E.A., Papanikolaou, Y., Tsoumakas, G., Monard, M.C.: Multi-label active learning: key issues and a novel query strategy. Evol. Syst. 10, 63–78 (2019)
Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: AAAI, vol. 5, pp. 746–751 (2005)
Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of Portuguese juridical documents. In: Pires, F.M., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 435–444. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24580-3_50
Gui, X., Lu, X., Yu, G.: Cost-effective batch-mode multi-label active learning. Neurocomputing 463, 355–367 (2021)
Li, X., Wang, L., Sung, E.: Multilabel SVM active learning for image classification. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 4, pp. 2207–2210. IEEE (2004)
Mujawar, S.S., Bhaladhare, P.R.: An aspect based multi-label sentiment analysis using improved BERT system. Int. J. Intell. Syst. Appl. Eng. 11(1s), 228–235 (2023)
Nadeem, M.I., et al.: SHO-CNN: a metaheuristic optimization of a convolutional neural network for multi-label news classification. Electronics 12(1), 113 (2022)
Parvaneh, A., Abbasnejad, E., Teney, D., Haffari, G.R., Van Den Hengel, A., Shi, J.Q.: Active learning by feature mixing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12237–12246 (2022)
Rafi, M., Abid, F.: Learning local and global features for optimized multi-label text classification. In: 2022 International Arab Conference on Information Technology (ACIT), pp. 1–9. IEEE (2022)
Reyes, O., Morell, C., Ventura, S.: Effective active learning strategy for multi-label learning. Neurocomputing 273, 494–508 (2018)
Reyes, O., Ventura, S.: Evolutionary strategy to perform batch-mode active learning on multi-label data. ACM Trans. Intell. Syst. Technol. (TIST) 9(4), 1–26 (2018)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Sener, O., Savarese, S.: Active learning for convolutional neural networks: a Core-Set approach. arXiv preprint arXiv:1708.00489 (2017)
Shui, C., Zhou, F., Gagné, C., Wang, B.: Deep active learning: unified and principled method for query and training. In: International Conference on Artificial Intelligence and Statistics, pp. 1308–1318. PMLR (2020)
Song, R., et al.: Label prompt for multi-label text classification. In: Applied Intelligence, pp. 1–15 (2022)
Wang, M., Feng, T., Shan, Z., Min, F.: Attribute and label distribution driven multi-label active learning. Appl. Intell. 52(10), 11131–11146 (2022)
Wertz, L., Mirylenka, K., Kuhn, J., Bogojeska, J.: Investigating active learning sampling strategies for extreme multi label text classification. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4597–4605 (2022)
Wu, J., et al.: Multi-label active learning algorithms for image classification: overview and future promise. ACM Comput. Surv. (CSUR) 53(2), 1–35 (2020)
Wu, K., Cai, D., He, X.: Multi-label active learning based on submodular functions. Neurocomputing 313, 436–442 (2018)
Yang, B., Sun, J.T., Wang, T., Chen, Z.: Effective multi-label active learning for text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 917–926 (2009)
Zhang, M., Plank, B.: Cartography active learning. arXiv preprint arXiv:2109.04282 (2021)
Zhang, X., Xu, J., Soh, C., Chen, L.: LA-HCN: label-based attention for hierarchical multi-label text classification neural network. Expert Syst. Appl. 187, 115922 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cheng, X., Zhou, F., Wang, Q., Wang, Y., Wang, Y. (2023). MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-44693-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)