research-article

An Ensemble Model for Combating Label Noise

Authors:

Wenbo HeAuthors Info & Claims

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Pages 608 - 617

https://doi.org/10.1145/3488560.3498376

Published: 15 February 2022 Publication History

Abstract

The labels crawled from web services (e.g. querying images from search engines and collecting tags from social media images) are often prone to noise, and the presence of such label noise degrades the classification performance of the resulting deep neural network (DNN) models. In this paper, we propose an ensemble model consisting of two networks to prevent the model from memorizing noisy labels. Within our model, we have one network generate an anchoring label from its prediction on a weakly-augmented image. Meanwhile, we force its peer network, taking the strongly-augmented version of the same image as input, to generate prediction close to the anchoring label for knowledge distillation. By observing the loss distribution, we use a mixture model to dynamically estimate the clean probability of each training sample and generate a confidence clean set. Then we train both networks simultaneously by the clean set to minimize our loss function which contains unsupervised matching loss (i.e., measure the consistency of the two networks) and supervised classification loss (i.e. measure the classification performance). We theoretically analyze the gradient of our loss function to show that it implicitly prevents memorization of the wrong labels. Experiments on two simulated benchmarks and one real-world dataset demonstrate that our approach achieves substantial improvements over the state-of-the-art methods.

Supplementary Material

MP4 File (WSDM22-fp033.mp4)

This is the presentation video for the paper titled "An Ensemble Model for Combating Label Noise". In this video, the author discusses the background of learning with label noise and introduces an ensemble method "Co-matching" to robustly train the DNN model under label noise.

Download
21.86 MB

References

[1]

Eric Arazo, Diego Ortego, Paul Albert, Noel O'Connor, and Kevin Mcguinness. 2019. Unsupervised Label Noise Modeling and Loss Correction. In International Conference on Machine Learning. 312--321.

[2]

Devansh Arpit, Stanislaw K Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron C Courville, Yoshua Bengio, et al. 2017. A Closer Look at Memorization in Deep Networks. In ICML.

[3]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. 41--48.

Digital Library

[4]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory. 92--100.

Digital Library

[5]

Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. 2019. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels. In International Conference on Machine Learning. 1062--1070.

[6]

Pengfei Chen, Junjie Ye, Guangyong Chen, Jingwei Zhao, and Pheng-Ann Heng. 2021. Robustness of Accuracy Metric and its Inspirations in Learning with Noisy Labels. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11451--11461.

[7]

Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE conference on computer vision and pattern recognition. 113--123.

[8]

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . 702--703.

[9]

Yifan Ding, Liqiang Wang, Deliang Fan, and Boqing Gong. 2018. A semi-supervised two-stage approach to learning from noisy labels. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1215--1224.

[10]

Aritra Ghosh, Himanshu Kumar, and PS Sastry. 2017. Robust loss functions under label noise for deep neural networks. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence . 1919--1925.

[11]

Jacob Goldberger and Ehud Ben-Reuven. 2016. Training deep neural-networks using a noise adaptation layer. (2016).

[12]

Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems. 529--536.

[13]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in neural information processing systems. 8527--8537.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[15]

Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. 2018. Using trusted data to train deep networks on labels corrupted by severe noise. In Advances in neural information processing systems. 10456--10465.

[16]

Wei Hu, Zhiyuan Li, and Dingli Yu. 2019. Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee. In International Conference on Learning Representations .

[17]

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning . 2304--2313.

[18]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[19]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[20]

Junnan Li, Richard Socher, and Steven CH Hoi. 2020 a. DivideMix: Learning with Noisy Labels as Semi-supervised Learning. In International Conference on Learning Representations .

[21]

Mingchen Li, Mahdi Soltanolkotabi, and Samet Oymak. 2020 b. Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In International conference on artificial intelligence and statistics. PMLR, 4313--4324.

[22]

Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. 2017. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862 (2017).

[23]

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda. 2020. Early-Learning Regularization Prevents Memorization of Noisy Labels. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[24]

Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, and Hongsheng Li. 2019. Improving referring expression grounding with cross-modal attention-guided erasing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1950--1959.

[25]

Yangdi Lu, Yang Bo, and Wenbo He. 2021. Confidence Adaptive Regularization for Deep Learning with Noisy Labels. arXiv preprint arXiv:2108.08212 (2021).

[26]

Yueming Lyu and Ivor W Tsang. 2019. Curriculum Loss: Robust Learning and Generalization against Label Corruption. In International Conference on Learning Representations .

[27]

Xingjun Ma, Hanxun Huang, Yisen Wang, Simone Romano, Sarah Erfani, and James Bailey. 2020. Normalized loss functions for deep learning with noisy labels. In International Conference on Machine Learning. PMLR, 6543--6553.

[28]

Xingjun Ma, Yisen Wang, Michael E Houle, Shuo Zhou, Sarah Erfani, Shutao Xia, Sudanthi Wijewickrema, and James Bailey. 2018. Dimensionality-Driven Learning with Noisy Labels. In International Conference on Machine Learning . 3355--3364.

[29]

Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens Van Der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In Proceedings of the European conference on computer vision (ECCV). 181--196.

Digital Library

[30]

Eran Malach and Shai Shalev-Shwartz. 2017. Decoupling" when to update" from" how to update". In Advances in Neural Information Processing Systems. 960--970.

[31]

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1944--1952.

[32]

Haim Permuter, Joseph Francos, and Ian Jermyn. 2006. A study of Gaussian mixture models of color and texture features for image classification and segmentation. Pattern recognition, Vol. 39, 4 (2006), 695--706.

[33]

Scott Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, and Andrew Rabinovich. 2014. Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596 (2014).

[34]

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to Reweight Examples for Robust Deep Learning. In International Conference on Machine Learning. 4334--4343.

[35]

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. 2019. Selfie: Refurbishing unclean samples for robust deep learning. In International Conference on Machine Learning. 5907--5915.

[36]

Daiki Tanaka, Daiki Ikami, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2018. Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 5552--5560.

[37]

Brendan Van Rooyen, Aditya Menon, and Robert C Williamson. 2015. Learning with symmetric label noise: The importance of being unhinged. In Advances in Neural Information Processing Systems. 10--18.

[38]

Yisen Wang, Weiyang Liu, Xingjun Ma, James Bailey, Hongyuan Zha, Le Song, and Shu-Tao Xia. 2018. Iterative learning with open-set noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8688--8696.

[39]

Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. 2019. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE International Conference on Computer Vision. 322--330.

[40]

Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. 2020. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13726--13735.

[41]

Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, and Masashi Sugiyama. 2019. Are Anchor Points Really Indispensable in Label-Noise Learning?. In Advances in Neural Information Processing Systems. 6838--6849.

[42]

Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. 2015. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition . 2691--2699.

[43]

Kun Yi and Jianxin Wu. 2019. Probabilistic end-to-end noise correction for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 7017--7025.

[44]

X Yu, B Han, J Yao, G Niu, IW Tsang, and M Sugiyama. 2019. How does disagreement help generalization against label corruption?. In 36th International Conference on Machine Learning, ICML 2019 .

[45]

Xiyu Yu, Tongliang Liu, Mingming Gong, and Dacheng Tao. 2018. Learning with biased complementary labels. In Proceedings of the European Conference on Computer Vision (ECCV). 68--83.

Digital Library

[46]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, Vol. 64, 3 (2021), 107--115.

Digital Library

[47]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations .

[48]

Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems. 8778--8788.

Cited By

Zhang YDing RShi HLiu JYu QCao GLi X(2024)Ensemble Network-Based Distillation for Hyperspectral Image Classification in the Presence of Label NoiseRemote Sensing10.3390/rs1622424716:22(4247)Online publication date: 14-Nov-2024
https://doi.org/10.3390/rs16224247
Wu HJia BSheng G(2024)Early-Late Dropout for DivideMix: Learning with Noisy Labels in Deep Neural Networks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650652(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650652
Qin YChen YPeng DPeng XZhou JHu P(2024)Noisy-Correspondence Learning for Text-to-Image Person Re-Identification2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02568(27187-27196)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02568
Show More Cited By

Index Terms

An Ensemble Model for Combating Label Noise
1. Computing methodologies
  1. Machine learning
  2. Modeling and simulation
    1. Model development and analysis
      1. Uncertainty quantification

Recommendations

Multi-label Text Classification with Label Correction under Noise
ICCPR '21: Proceedings of the 2021 10th International Conference on Computing and Pattern Recognition

Multi-label text classification (MLTC) is a fundamental but difficult problem in text mining, the goal of MLTC is to assign a set of most relevant labels for the given document. While existing supervised training of deep learning models for MLTC ...
Semi-supervised Ensemble Learning Using Label Propagation
CIT '12: Proceedings of the 2012 IEEE 12th International Conference on Computer and Information Technology

Ensemble learning has been widely used in data mining and pattern recognition. However, when the number of labeled data samples is very small, it is difficult to train a base classifier for ensemble learning, therefore, it is necessary to utilize an ...
A Novel Online Stacked Ensemble for Multi-Label Stream Classification
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

February 2022

1690 pages

ISBN:9781450391320

DOI:10.1145/3488560

General Chairs:
K. Selcuk Candan
Arizona State University, USA
,
Huan Liu
Arizona State University, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Xin Luna Dong
Meta Platforms, Inc. (former Facebook), USA
,
Jiliang Tang
Michigan State University, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM '22

Sponsor:

WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining

February 21 - 25, 2022

AZ, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
420
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)6

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YDing RShi HLiu JYu QCao GLi X(2024)Ensemble Network-Based Distillation for Hyperspectral Image Classification in the Presence of Label NoiseRemote Sensing10.3390/rs1622424716:22(4247)Online publication date: 14-Nov-2024
https://doi.org/10.3390/rs16224247
Wu HJia BSheng G(2024)Early-Late Dropout for DivideMix: Learning with Noisy Labels in Deep Neural Networks2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650652(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650652
Qin YChen YPeng DPeng XZhou JHu P(2024)Noisy-Correspondence Learning for Text-to-Image Person Re-Identification2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02568(27187-27196)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02568
Cao YLu J(2024)False Positive Detection for Text-Based Person RetrievalPRICAI 2024: Trends in Artificial Intelligence10.1007/978-981-96-0119-6_22(220-231)Online publication date: 12-Nov-2024
https://doi.org/10.1007/978-981-96-0119-6_22
Wang YLi KChen MWang LZhou SXue KGuo Y(2024)Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in OptimizationComputer Vision – ECCV 202410.1007/978-3-031-72949-2_27(477-493)Online publication date: 31-Oct-2024
https://doi.org/10.1007/978-3-031-72949-2_27
Wei XGong XZhan YDu BLuo YHu WChua TLauw HSi LTerzi ETsaparas P(2023)CLNode: Curriculum Learning for Node ClassificationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570385(670-678)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3570385
Feng ZGani HDamayanti AGani H(2023)An explainable ensemble machine learning model to elucidate the influential drilling parameters based on rate of penetration predictionGeoenergy Science and Engineering10.1016/j.geoen.2023.212231231(212231)Online publication date: Dec-2023
https://doi.org/10.1016/j.geoen.2023.212231
Lu YBo YHe WKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Noise attention learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601953(23164-23177)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601953
Kim YLee S(2022)Data Valuation Algorithm for Inertial Measurement Unit-Based Human Activity RecognitionSensors10.3390/s2301018423:1(184)Online publication date: 24-Dec-2022
https://doi.org/10.3390/s23010184

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents