MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification

Cheng, Xuan; Zhou, Feng; Wang, Qing; Wang, Yitong; Wang, Yiting

doi:10.1007/978-3-031-44693-1_27

Xuan Cheng^11,12,
Feng Zhou^11,12,
Qing Wang¹³,
Yitong Wang¹³ &
…
Yiting Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14302))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1855 Accesses

Abstract

Data labeling for multi-label text is a challenging task in natural language processing, and active learning has emerged as a promising approach to reduce annotation effort while improving model performance. The primary challenge in multi-label active learning is to develop query strategies that can effectively select the most valuable unlabeled instances for annotation. Batch-mode active learning approaches, which select a batch of informative and diverse instances in each iteration, have been considered useful for improving annotation efficiency. However, challenges such as incomplete information ranking and high computational costs still hinder the progress of batch-mode methods. In this paper, we propose MCVIE, a novel batch-mode active learning method for multi-label text. MCVIE employs a two-stage active learning query strategy. Firstly, we combine two measures of prediction uncertainty and category vector inconsistency to calculate the basic information score for each example-label pair. Then, we use the Euclidean distance of text feature vectors to iteratively select diverse and informative example-label pairs for annotation. Experimental results on three benchmark datasets demonstrate that MCVIE outperforms other competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep active learning for multi label text classification

Article Open access 15 November 2024

Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification

Active Learning for Text Mining from Crowds

Notes

1.
https://github.com/hanhan1214/active-learning-mcvie.
2.
https://huggingface.co/distilbert-base-uncased.
3.
https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data.
4.
https://github.com/strwberry-smggls/ActiveLearningTextClassification/tree/main /AL/datasets.
5.
https://github.com/iliaschalkidis/lmtc-eurlex57k/tree/master/data/datasets.
6.
https://huggingface.co/datasets/go_emotions.
7.
This work was completed during the internship at China Mobile.

References

Cherman, E.A., Papanikolaou, Y., Tsoumakas, G., Monard, M.C.: Multi-label active learning: key issues and a novel query strategy. Evol. Syst. 10, 63–78 (2019)
Article Google Scholar
Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: AAAI, vol. 5, pp. 746–751 (2005)
Google Scholar
Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of Portuguese juridical documents. In: Pires, F.M., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 435–444. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24580-3_50
Chapter Google Scholar
Gui, X., Lu, X., Yu, G.: Cost-effective batch-mode multi-label active learning. Neurocomputing 463, 355–367 (2021)
Article Google Scholar
Li, X., Wang, L., Sung, E.: Multilabel SVM active learning for image classification. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 4, pp. 2207–2210. IEEE (2004)
Google Scholar
Mujawar, S.S., Bhaladhare, P.R.: An aspect based multi-label sentiment analysis using improved BERT system. Int. J. Intell. Syst. Appl. Eng. 11(1s), 228–235 (2023)
Google Scholar
Nadeem, M.I., et al.: SHO-CNN: a metaheuristic optimization of a convolutional neural network for multi-label news classification. Electronics 12(1), 113 (2022)
Article Google Scholar
Parvaneh, A., Abbasnejad, E., Teney, D., Haffari, G.R., Van Den Hengel, A., Shi, J.Q.: Active learning by feature mixing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12237–12246 (2022)
Google Scholar
Rafi, M., Abid, F.: Learning local and global features for optimized multi-label text classification. In: 2022 International Arab Conference on Information Technology (ACIT), pp. 1–9. IEEE (2022)
Google Scholar
Reyes, O., Morell, C., Ventura, S.: Effective active learning strategy for multi-label learning. Neurocomputing 273, 494–508 (2018)
Article Google Scholar
Reyes, O., Ventura, S.: Evolutionary strategy to perform batch-mode active learning on multi-label data. ACM Trans. Intell. Syst. Technol. (TIST) 9(4), 1–26 (2018)
Article Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Sener, O., Savarese, S.: Active learning for convolutional neural networks: a Core-Set approach. arXiv preprint arXiv:1708.00489 (2017)
Shui, C., Zhou, F., Gagné, C., Wang, B.: Deep active learning: unified and principled method for query and training. In: International Conference on Artificial Intelligence and Statistics, pp. 1308–1318. PMLR (2020)
Google Scholar
Song, R., et al.: Label prompt for multi-label text classification. In: Applied Intelligence, pp. 1–15 (2022)
Google Scholar
Wang, M., Feng, T., Shan, Z., Min, F.: Attribute and label distribution driven multi-label active learning. Appl. Intell. 52(10), 11131–11146 (2022)
Article Google Scholar
Wertz, L., Mirylenka, K., Kuhn, J., Bogojeska, J.: Investigating active learning sampling strategies for extreme multi label text classification. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4597–4605 (2022)
Google Scholar
Wu, J., et al.: Multi-label active learning algorithms for image classification: overview and future promise. ACM Comput. Surv. (CSUR) 53(2), 1–35 (2020)
Article Google Scholar
Wu, K., Cai, D., He, X.: Multi-label active learning based on submodular functions. Neurocomputing 313, 436–442 (2018)
Article Google Scholar
Yang, B., Sun, J.T., Wang, T., Chen, Z.: Effective multi-label active learning for text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 917–926 (2009)
Google Scholar
Zhang, M., Plank, B.: Cartography active learning. arXiv preprint arXiv:2109.04282 (2021)
Zhang, X., Xu, J., Soh, C., Chen, L.: LA-HCN: label-based attention for hierarchical multi-label text classification neural network. Expert Syst. Appl. 187, 115922 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing, China
Xuan Cheng & Feng Zhou
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China
Xuan Cheng & Feng Zhou
JiuTian Team, China Mobile Research Institute, Beijing, China
Qing Wang, Yitong Wang & Yiting Wang

Authors

Xuan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Feng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yitong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yiting Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qing Wang .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, X., Zhou, F., Wang, Q., Wang, Y., Wang, Y. (2023). MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14302. Springer, Cham. https://doi.org/10.1007/978-3-031-44693-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-44693-1_27
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44692-4
Online ISBN: 978-3-031-44693-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep active learning for multi label text classification

Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification

Active Learning for Text Mining from Crowds

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep active learning for multi label text classification

Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification

Active Learning for Text Mining from Crowds

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation