Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

CoMix: Confronting with Noisy Label Learning with Co-training Strategies on Textual Mislabeling

Published: 16 August 2024 Publication History

Abstract

The existence of noisy labels is inevitable in real-world large-scale corpora. As deep neural networks are notably vulnerable to overfitting on noisy samples, this highlights the importance of the ability of language models to resist noise for efficient training. However, little attention has been paid to alleviating the influence of label noise in natural language processing. To address this problem, we present CoMix, a robust Noise-against training strategy taking advantage of Co-training that deals with textual annotation errors in text classification tasks. In our proposed framework, the original training set is first split into labeled and unlabeled subsets according to a sample partition criteria and then applies label refurbishment on the unlabeled subsets. We implement textual interpolation in hidden space between samples on the updated subsets. Meanwhile, we employ peer diverged networks simultaneously leveraging co-training strategies to avoid the accumulation of confirm bias. Experimental results on three popular text classification benchmarks demonstrate the effectiveness of CoMix in bolstering the network’s resistance to label mislabeling under various noise types and ratios, which also outperforms the state-of-the-art methods.

References

[1]
David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A. Raffel. 2019. MixMatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems 32 (2019), 5050–5060.
[2]
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.
[3]
Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum. 2017. Active bias: Training more accurate neural networks by emphasizing high variance samples. Advances in Neural Information Processing Systems 30 (2017), 1002–1012.
[4]
Jiaao Chen, Zichao Yang, and Diyi Yang. 2020. MixText: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2147–2157.
[5]
Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. 2019. Understanding and utilizing deep neural networks trained with noisy labels. In Proceedings of the International Conference on Machine Learning. PMLR, 1062–1070.
[6]
Pengfei Chen, Junjie Ye, Guangyong Chen, Jingwei Zhao, and Pheng-Ann Heng. 2021. Beyond class-conditional assumption: A primary attempt to combat instance-dependent label noise. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11442–11450.
[7]
Viv Cothey. 2004. Web-crawling reliability. J. Am. Soc. Inf. Sci. Technol. 55, 14 (2004), 1228–1238.
[8]
Antonio Gulli. 2005. The anatomy of a news search engine. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. 880–881.
[9]
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems 31 (2018), 8536–8546.
[10]
Dan Hendrycks, Kimin Lee, and Mantas Mazeika. 2019. Using pre-training can improve model robustness and uncertainty. In Proceedings of the International Conference on Machine Learning. PMLR, 2712–2721.
[11]
Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. 2018. Using trusted data to train deep networks on labels corrupted by severe noise. Advances in Neural Information Processing Systems 31 (2018), 10477–10486.
[12]
Nakamasa Inoue and Keita Goto. 2020. Semi-supervised contrastive learning with generalized contrastive loss and its application to speaker recognition. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’20). IEEE, 1641–1646.
[13]
Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19).
[14]
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Proceedings of the International Conference on Machine Learning. PMLR, 2304–2313.
[15]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 4171–4186.
[16]
Samuli Laine and Timo Aila. 2016. Temporal ensembling for semi-supervised learning. In Proceedings of the International Conference on Learning Representations.
[17]
Junnan Li, Richard Socher, and Steven C. H. Hoi. 2019. DivideMix: Learning with noisy labels as semi-supervised learning. In Proceedings of the International Conference on Learning Representations.
[18]
Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics (COLING’02).
[19]
Yang Li, Quan Pan, Suhang Wang, Haiyun Peng, Tao Yang, and Erik Cambria. 2019. Disentangled variational auto-encoder for semi-supervised learning. Inf. Sci. 482 (2019), 73–85.
[20]
Yang Liu and Hongyi Guo. 2020. Peer loss functions: Learning from noisy labels without knowing noise rates. In Proceedings of the International Conference on Machine Learning. PMLR, 6226–6236.
[21]
Andrew Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142–150.
[22]
Takeru Miyato, Andrew M. Dai, and Ian Goodfellow. 2016. Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016).
[23]
Tam Nguyen, C. Mummadi, T. Ngo, L. Beggel, and Thomas Brox. 2020. SELF: Learning to filter noisy labels with self-ensembling. In Proceedings of the International Conference on Learning Representations (ICLR’20).
[24]
Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1944–1952.
[25]
Dan Qiao, Chenchen Dai, Yuyang Ding, Juntao Li, Qiang Chen, Wenliang Chen, and Min Zhang. 2022. SelfMix: Robust learning against textual label noise with self-mixup training. In Proceedings of the 29th International Conference on Computational Linguistics. 960–970.
[26]
Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 4334–4343.
[27]
Xiaoshuang Shi, Zhenhua Guo, Kang Li, Yun Liang, and Xiaofeng Zhu. 2023. Self-paced resistance learning against overfitting on noisy labels. Pattern Recog. 134 (2023), 109080.
[28]
Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. 2019. Meta-Weight-Net: Learning an explicit mapping for sample weighting. Advances in Neural Information Processing Systems 32 (2019), 1917–1928.
[29]
Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, and Nathan Silberman. 2019. Learning from noisy labels by regularized estimation of annotator confusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11244–11253.
[30]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems 30 (2017), 1195–1204.
[31]
Ruxin Wang, Tongliang Liu, and Dacheng Tao. 2017. Multiclass learning with partially corrupted labels. IEEE Trans. Neural Netw. Learn. Syst. 29, 6 (2017), 2568–2580.
[32]
Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. 2019. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 322–330.
[33]
Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. 2020. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13726–13735.
[34]
Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. 2019. Are anchor points really indispensable in label-noise learning? Advances in Neural Information Processing Systems 32 (2019), 6835–6846.
[35]
Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised data augmentation for consistency training. Advan. Neural Inf. Process. Syst. 33 (2020), 6256–6268.
[36]
Weidi Xu, Haoze Sun, Chao Deng, and Ying Tan. 2017. Variational autoencoder for semi-supervised text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.
[37]
Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Jiankang Deng, Gang Niu, and Masashi Sugiyama. 2020. Dual T: Reducing estimation error for transition matrix in label-noise learning. Advan. Neural Inf. Process. Syst. 33 (2020), 7260–7271.
[38]
Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. 2019. How does disagreement help generalization against label corruption? In Proceedings of the International Conference on Machine Learning. PMLR, 7164–7173.
[39]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 3 (2021), 107–115.
[40]
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations.
[41]
Yu Zhang, Fan Lin, Siya Mi, and Yali Bian. 2023. Self-label correction for image classification with noisy labels. Pattern Analysis and Applications 26, 3 (2023), 1–10.
[42]
Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neuralinformation Processing Systems 31 (2018), 8792–8802.

Index Terms

  1. CoMix: Confronting with Noisy Label Learning with Co-training Strategies on Textual Mislabeling

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 9
      September 2024
      186 pages
      EISSN:2375-4702
      DOI:10.1145/3613646
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 August 2024
      Online AM: 15 July 2024
      Accepted: 03 July 2024
      Revised: 07 April 2024
      Received: 23 September 2023
      Published in TALLIP Volume 23, Issue 9

      Check for updates

      Author Tags

      1. Noisy label learning
      2. textual label noise
      3. co-training

      Qualifiers

      • Research-article

      Funding Sources

      • National Key R&D Programme of China
      • Major Project of Anhui Province
      • General Programmer of the National Natural Science Foundation of China
      • University Synergy Innovation Program of Anhui Province

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 104
        Total Downloads
      • Downloads (Last 12 months)104
      • Downloads (Last 6 weeks)26
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media