research-article

Robust data pruning under label noise via maximizing re-labeling accuracy

AUTHORs:

Jae-Gil LeeAuthors Info & Claims

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Article No.: 3257, Pages 74501 - 74514

Published: 30 May 2024 Publication History

Abstract

Data pruning, which aims to downsize a large training set into a small informative subset, is crucial for reducing the enormous computational costs of modern deep learning. Though large-scale data collections invariably contain annotation noise and numerous robust learning methods have been developed, data pruning for the noise-robust learning scenario has received little attention. With state-of-the-art Re-labeling methods that self-correct erroneous labels while training, it is challenging to identify which subset induces the most accurate re-labeling of erroneous labels in the entire training set. In this paper, we formalize the problem of data pruning with re-labeling. We first show that the likelihood of a training example being correctly re-labeled is proportional to the prediction confidence of its neighborhood in the subset. Therefore, we propose a novel data pruning algorithm, Prune4ReL, that finds a subset maximizing the total neighborhood confidence of all training examples, thereby maximizing the re-labeling accuracy and generalization performance. Extensive experiments on four real and one synthetic noisy datasets show that Prune4ReL outperforms the baselines with Re-labeling models by up to 9.1% as well as those with a standard model by up to 21.6%.

Supplementary Material

Additional material (3666122.3669379_supp.pdf)

Supplemental material.

Download
288.79 KB

References

[1]

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Patwary, Mostofa Ali, Yang Yang, and Yanqi Zhou. Deep Learning Scaling is Predictable, Empirically. arXiv preprint arXiv:1712.00409, 2017.

[2]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361, 2020.

[3]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language Models are Few-shot Learners. In NeurIPS, pages 1877-1901,2020.

[4]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models from Natural Language Supervision. In ICML, pages 8748-8763, 2021.

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929, 2020.

[6]

Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos. Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning. In NeurIPS, pages 19523-19536, 2022.

[7]

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with Noisy Labels Revisited: A Study using Real-world Human Annotations. arXiv preprint arXiv:2110.12088, 2021.

[8]

Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. Webvision Database: Visual Learning and Understanding from Web Data. arXiv preprint arXiv:1708.02862, 2017.

[9]

Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. Learning from Massive Noisy Labeled Data for Image Classification. In CVPR, pages 2691-2699, 2015.

[10]

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. Learning from Noisy Labels with Deep Neural Networks: A Survey. TNNLS, pages 1-19, 2022.

[11]

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. Selfie: Refurbishing Unclean Samples for Robust Deep Learning. In ICML, pages 5907-5915, 2019.

[12]

Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. Unsupervised Data Augmentation for Consistency Training. In NeurIPS, pages 6256-6268, 2020.

[13]

Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with Noisy Labels as Semi-supervised Learning. arXiv preprint arXiv:2002.07394, 2020.

[14]

Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon. An Empirical Study of Example Forgetting during Deep Neural Network Learning. arXiv preprint arXiv:1812.05159, 2018.

[15]

Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. Deep Learning on a Data Diet: Finding Important Examples Early in Training. In NeurIPS, pages 20596-20607, 2021.

[16]

Chengcheng Guo, Bo Zhao, and Yanbing Bai. Deepcore: A Comprehensive Library for Coreset Selection in Deep Learning. In DEXA, pages 181-195, 2022.

Digital Library

[17]

Dongmin Park, Yooju Shin, Jihwan Bang, Youngjun Lee, Hwanjun Song, and Jae-Gil Lee. Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning. In NeurIPS, pages 31416-31429, 2022.

[18]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels. In NeurIPS, pages 8536-8546, 2018.

[19]

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. Glister: Generalization based Data Subset Selection for Efficient and Robust Learning. In AAAI, pages 8110-8118, 2021.

[20]

Jacob Goldberger and Ehud Ben-Reuven. Training Deep Neural Networks Using a Noise Adaptation Layer. In ICLR, 2017.

[21]

Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, and Masashi Sugiyama. Masking: A New Perspective of Noisy Supervision. In NeurIPS, pages 5841-5851, 2018.

[22]

Jiangchao Yao, Jiajie Wang, Ivor W Tsang, Ya Zhang, Jun Sun, Chengqi Zhang, and Rui Zhang. Deep Learning from Noisy Image Labels with Quality Embedding. TIP, 28:1909-1922, 2018.

[23]

Aritra Ghosh, Himanshu Kumar, and P Shanti Sastry. Robust Loss Functions under Label Noise for Deep Neural Networks. In AAAI, pages 1919-1925, 2017.

Digital Library

[24]

Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise. In NeurIPS, pages 10477-10486, 2018.

[25]

Xingjun Ma, Hanxun Huang, Yisen Wang, Simone Romano, Sarah Erfani, and James Bailey. Normalized Loss Functions for Deep Learning with Noisy Labels. In ICML, pages 6543-6553, 2020.

[26]

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning Data-driven Curriculum for Very Deep Neural Networks on Corrupted Labels. In ICML, pages 2304-2313, 2018.

[27]

Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. Combating Noisy Labels by Agreement: A Joint Training Method with Co-regularization. In CVPR, pages 13726-13735, 2020.

[28]

Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, and Chao Chen. A Topological Filter for Learning with Label Noise. In NeurIPS, pages 21382-21393, 2020.

[29]

Tianyi Zhou, Shengjie Wang, and Jeff Bilmes. Robust Curriculum Learning: From Clean Label Detection to Noisy Label Self-correction. In ICLR, 2021.

[30]

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical Automated Data Augmentation with a Reduced Search Space. In NeurIPS, pages 18613-18624, 2020.

[31]

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda. Early-learning Regularization Prevents Memorization of Noisy Labels. In NeurIPS, pages 20331-20342, 2020.

[32]

Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, and Yang Liu. Learning with Instance-dependent Label Noise: A Sample Sieve Approach. arXiv preprint arXiv:2010.02347, 2020.

[33]

Sheng Liu, Zhihui Zhu, Qing Qu, and Chong You. Robust Training under Label Noise by Over-parameterization. In ICML, pages 14153-14172, 2022.

[34]

Dan Wang and Yi Shang. A New Active Labeling Method for Deep Learning. In IJCNN, pages 112-119, 2014.

[35]

Dan Roth and Kevin Small. Margin-based Active Learning for Structured Output Spaces. In ECML, pages 413-424, 2006.

Digital Library

[36]

Ajay J Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. Multi-class Active Learning for Image Classification. In CVPR, pages 2372-2379, 2009.

[37]

Yutian Chen, Max Welling, and Alex Smola. Super-samples from Kernel Herding. arXiv preprint arXiv:1203.3472, 2012.

[38]

Ozan Sener and Silvio Savarese. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In ICLR, 2018.

[39]

Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad match: Gradient Matching based Data Subset Selection for Efficient Deep Model Training. In ICML, pages 5464-5474, 2021.

[40]

Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for Data-efficient Training of Machine Learning Models. In ICML, pages 6950-6960, 2020.

[41]

Dongmin Park, Dimitris Papailiopoulos, and Kangwook Lee. Active Learning is A Strong Baseline for Data Subset Selection. In NeurIPS Workshop, 2022.

[42]

Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. Coverage-centric Coreset Selection for High Pruning Rates. ICLR, 2022.

[43]

Xiaobo Xia, Jiale Liu, Jun Yu, Xu Shen, Bo Han, and Tongliang Liu. Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning. In ICLR, 2022.

[44]

Erik Englesson and Hossein Azizpour. Consistency Regularization Can Improve Robustness to Label Noise. arXiv preprint arXiv:2110.01242, 2021.

[45]

Colin Wei, Kendrick Shen, Yining Chen, and Tengyu Ma. Theoretical Analysis of Self-training with Deep Networks on Unlabeled Data. arXiv preprint arXiv:2010.03622, 2020.

[46]

Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, and Sanjiv Kumar. Batch Active Learning at Scale. In NeurIPS, pages 11933-11944, 2021.

[47]

Alex Krizhevsky, Geoffrey Hinton, et al. Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto, 2009.

[48]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A Large-scale Hierarchical Image Database. In CVPR, pages 248-255, 2009.

[49]

Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels. In ICML, pages 1062-1070, 2019.

[50]

Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. Selection via Proxy: Efficient Data Selection for Deep Learning. arXiv preprint arXiv:1906.11829, 2019.

[51]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity Mappings in Deep Residual Networks. In ECCV, pages 630-645, 2016.

[52]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. Inception-v4, inception-resnet and the Impact of Residual Connections on Learning. In AAAI, pages 4278-4284, 2017.

Digital Library

[53]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In CVPR, pages 770-778, 2016.

[54]

Uriel Feige. A Threshold of ln n for Approximating Set Cover. Journal of the ACM (JACM), 45(4):634-652, 1998.

Digital Library

[55]

Qing Yu, Daiki Ikami, Go Irie, and Kiyoharu Aizawa. Multi-task Curriculum Framework for Open-set Semi-supervised Learning. In ECCV, pages 438-454, 2020.

Digital Library

[56]

Dongmin Park, Hwanjun Song, MinSeok Kim, and Jae-Gil Lee. Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data. In NeurIPS, pages 4040-4052, 2021.

[57]

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A Continual Learning Survey: Defying Forgetting in Classification Tasks. PAMI, 44(7):3366-3385, 2021.

[58]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural Architecture Search: A Survey. JMLR, 20(1):1997-2017, 2019.

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Semi-supervised partial label learning algorithm via reliable label propagation
Abstract
Partial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...
Improving Text Classification Accuracy by Training Label Cleaning

In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

December 2023

80772 pages

Copyright © 2023 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents