research-article

CoMix: Confronting with Noisy Label Learning with Co-training Strategies on Textual Mislabeling

Authors:

Xiao SunAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 9

Article No.: 133, Pages 1 - 16

https://doi.org/10.1145/3678175

Published: 16 August 2024 Publication History

Abstract

The existence of noisy labels is inevitable in real-world large-scale corpora. As deep neural networks are notably vulnerable to overfitting on noisy samples, this highlights the importance of the ability of language models to resist noise for efficient training. However, little attention has been paid to alleviating the influence of label noise in natural language processing. To address this problem, we present CoMix, a robust Noise-against training strategy taking advantage of Co-training that deals with textual annotation errors in text classification tasks. In our proposed framework, the original training set is first split into labeled and unlabeled subsets according to a sample partition criteria and then applies label refurbishment on the unlabeled subsets. We implement textual interpolation in hidden space between samples on the updated subsets. Meanwhile, we employ peer diverged networks simultaneously leveraging co-training strategies to avoid the accumulation of confirm bias. Experimental results on three popular text classification benchmarks demonstrate the effectiveness of CoMix in bolstering the network’s resistance to label mislabeling under various noise types and ratios, which also outperforms the state-of-the-art methods.

References

[1]

David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A. Raffel. 2019. MixMatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems 32 (2019), 5050–5060.

[2]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.

Digital Library

[3]

Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum. 2017. Active bias: Training more accurate neural networks by emphasizing high variance samples. Advances in Neural Information Processing Systems 30 (2017), 1002–1012.

[4]

Jiaao Chen, Zichao Yang, and Diyi Yang. 2020. MixText: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2147–2157.

[5]

Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. 2019. Understanding and utilizing deep neural networks trained with noisy labels. In Proceedings of the International Conference on Machine Learning. PMLR, 1062–1070.

[6]

Pengfei Chen, Junjie Ye, Guangyong Chen, Jingwei Zhao, and Pheng-Ann Heng. 2021. Beyond class-conditional assumption: A primary attempt to combat instance-dependent label noise. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11442–11450.

[7]

Viv Cothey. 2004. Web-crawling reliability. J. Am. Soc. Inf. Sci. Technol. 55, 14 (2004), 1228–1238.

Digital Library

[8]

Antonio Gulli. 2005. The anatomy of a news search engine. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. 880–881.

Digital Library

[9]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems 31 (2018), 8536–8546.

[10]

Dan Hendrycks, Kimin Lee, and Mantas Mazeika. 2019. Using pre-training can improve model robustness and uncertainty. In Proceedings of the International Conference on Machine Learning. PMLR, 2712–2721.

[11]

Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. 2018. Using trusted data to train deep networks on labels corrupted by severe noise. Advances in Neural Information Processing Systems 31 (2018), 10477–10486.

[12]

Nakamasa Inoue and Keita Goto. 2020. Semi-supervised contrastive learning with generalized contrastive loss and its application to speaker recognition. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’20). IEEE, 1641–1646.

[13]

Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19).

[14]

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proceedings of the International Conference on Machine Learning. PMLR, 2304–2313.

[15]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 4171–4186.

[16]

Samuli Laine and Timo Aila. 2016. Temporal ensembling for semi-supervised learning. In Proceedings of the International Conference on Learning Representations.

[17]

Junnan Li, Richard Socher, and Steven C. H. Hoi. 2019. DivideMix: Learning with noisy labels as semi-supervised learning. In Proceedings of the International Conference on Learning Representations.

[18]

Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics (COLING’02).

Digital Library

[19]

Yang Li, Quan Pan, Suhang Wang, Haiyun Peng, Tao Yang, and Erik Cambria. 2019. Disentangled variational auto-encoder for semi-supervised learning. Inf. Sci. 482 (2019), 73–85.

Digital Library

[20]

Yang Liu and Hongyi Guo. 2020. Peer loss functions: Learning from noisy labels without knowing noise rates. In Proceedings of the International Conference on Machine Learning. PMLR, 6226–6236.

[21]

Andrew Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142–150.

Digital Library

[22]

Takeru Miyato, Andrew M. Dai, and Ian Goodfellow. 2016. Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016).

[23]

Tam Nguyen, C. Mummadi, T. Ngo, L. Beggel, and Thomas Brox. 2020. SELF: Learning to filter noisy labels with self-ensembling. In Proceedings of the International Conference on Learning Representations (ICLR’20).

[24]

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1944–1952.

[25]

Dan Qiao, Chenchen Dai, Yuyang Ding, Juntao Li, Qiang Chen, Wenliang Chen, and Min Zhang. 2022. SelfMix: Robust learning against textual label noise with self-mixup training. In Proceedings of the 29th International Conference on Computational Linguistics. 960–970.

[26]

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 4334–4343.

[27]

Xiaoshuang Shi, Zhenhua Guo, Kang Li, Yun Liang, and Xiaofeng Zhu. 2023. Self-paced resistance learning against overfitting on noisy labels. Pattern Recog. 134 (2023), 109080.

Digital Library

[28]

Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. 2019. Meta-Weight-Net: Learning an explicit mapping for sample weighting. Advances in Neural Information Processing Systems 32 (2019), 1917–1928.

[29]

Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, and Nathan Silberman. 2019. Learning from noisy labels by regularized estimation of annotator confusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11244–11253.

[30]

Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems 30 (2017), 1195–1204.

[31]

Ruxin Wang, Tongliang Liu, and Dacheng Tao. 2017. Multiclass learning with partially corrupted labels. IEEE Trans. Neural Netw. Learn. Syst. 29, 6 (2017), 2568–2580.

[32]

Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. 2019. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 322–330.

[33]

Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. 2020. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13726–13735.

[34]

Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. 2019. Are anchor points really indispensable in label-noise learning? Advances in Neural Information Processing Systems 32 (2019), 6835–6846.

[35]

Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised data augmentation for consistency training. Advan. Neural Inf. Process. Syst. 33 (2020), 6256–6268.

[36]

Weidi Xu, Haoze Sun, Chao Deng, and Ying Tan. 2017. Variational autoencoder for semi-supervised text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.

[37]

Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Jiankang Deng, Gang Niu, and Masashi Sugiyama. 2020. Dual T: Reducing estimation error for transition matrix in label-noise learning. Advan. Neural Inf. Process. Syst. 33 (2020), 7260–7271.

[38]

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. 2019. How does disagreement help generalization against label corruption? In Proceedings of the International Conference on Machine Learning. PMLR, 7164–7173.

[39]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 3 (2021), 107–115.

Digital Library

[40]

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations.

[41]

Yu Zhang, Fan Lin, Siya Mi, and Yali Bian. 2023. Self-label correction for image classification with noisy labels. Pattern Analysis and Applications 26, 3 (2023), 1–10.

Digital Library

[42]

Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neuralinformation Processing Systems 31 (2018), 8792–8802.

Index Terms

CoMix: Confronting with Noisy Label Learning with Co-training Strategies on Textual Mislabeling
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Contrastive label correction for noisy label learning
Abstract
Noisy label learning is an important process that facilitates the collection of noisy label data for training accurate deep neural networks. The latest label correction methods are effective approaches that focus on identifying label ...
Supervised contrastive learning with corrected labels for noisy label learning
Abstract
Deep neural networks have achieved significant success in the artificial intelligence community and various downstream tasks. They encode images or texts into dense feature representations and are supervised by a large amount of labeled data. Due ... $^{}$ $^{}$ $^{}$
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 9

September 2024

186 pages

EISSN:2375-4702

DOI:10.1145/3613646

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2024

Online AM: 15 July 2024

Accepted: 03 July 2024

Revised: 07 April 2024

Received: 23 September 2023

Published in TALLIP Volume 23, Issue 9

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Programme of China
Major Project of Anhui Province
General Programmer of the National Natural Science Foundation of China
University Synergy Innovation Program of Anhui Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
151
Total Downloads

Downloads (Last 12 months)151
Downloads (Last 6 weeks)15

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents