Article

NoiseRank: Unsupervised Label Noise Reduction with Dependence Models

Authors:

Karishma Sharma,

I. Zeki YalnizAuthors Info & Claims

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII

Pages 737 - 753

https://doi.org/10.1007/978-3-030-58583-9_44

Published: 23 August 2020 Publication History

Abstract

Label noise is increasingly prevalent in datasets acquired from noisy channels. Existing approaches that detect and remove label noise generally rely on some form of supervision, which is not scalable and error-prone. In this paper, we propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF). We construct a dependence model to estimate the posterior probability of an instance being incorrectly labeled given the dataset, and rank instances based on their estimated probabilities. Our method i) does not require supervision from ground-truth labels or priors on label or noise distribution, ii) is interpretable by design, enabling transparency in label noise removal, iii) is agnostic to classifier architecture/optimization framework and content modality. These advantages enable wide applicability in real noise settings, unlike prior works constrained by one or more conditions. NoiseRank improves state-of-the-art classification on Food101-N (

\sim

20% noise), and is effective on high noise Clothing-1M (

\sim

40% noise).

References

[1]

Arazo, E., Ortego, D., Albert, P., O’Connor, N., Mcguinness, K.: Unsupervised label noise modeling and loss correction. In: International Conference on Machine Learning, pp. 312–321 (2019)

[2]

Corbiere, C., Ben-Younes, H., Ramé, A., Ollion, C.: Leveraging weakly annotated data for fashion image retrieval and label prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2268–2274 (2017)

[3]

Delany SJ and Cunningham P Funk P and González Calero PA An analysis of case-base editing in a spam filtering system Advances in Case-Based Reasoning 2004 Heidelberg Springer 128-141

[4]

Franco A, Maltoni D, and Nanni L Data pre-processing through reward-punishment editing Pattern Anal. Appl. 2010 13 4 367-381

[5]

Frénay B and Verleysen M Classification in the presence of label noise: a survey IEEE Trans. Neural Netw. Learn. Syst. 2013 25 5 845-869

[6]

Gates G The reduced nearest neighbor rule (corresp.) IEEE Trans. Inf. Theory 1972 18 3 431-433

[7]

Guo, S., et al.: Curriculumnet: weakly supervised learning from large-scale web images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 135–150 (2018)

[8]

Han, J., Luo, P., Wang, X.: Deep self-learning from noisy labels. In: 2019 IEEE International Conference on Computer Vision, pp. 5138–5147. IEEE (2019)

[9]

Hart P The condensed nearest neighbor rule (corresp.) IEEE Trans. Inf. Theory 1968 14 3 515-516

[10]

Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2309–2318 (2018)

[11]

Jindal, I., Pressel, D., Lester, B., Nokleby, M.: An effective label noise model for DNN text classification. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 3246–3256 (2019)

[12]

Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with gpus. arXiv preprint arXiv:1702.08734 (2017)

[13]

Kodinariya TM and Makwana PR Review on determining number of cluster in k-means clustering Int. J. 2013 1 6 90-95

[14]

Lallich S, Muhlenbach F, and Zighed DA Hacid M-S, Raś ZW, Zighed DA, and Kodratoff Y Improving classification by removing or relabeling mislabeled instances Foundations of Intelligent Systems 2002 Heidelberg Springer 5-15

[15]

Lee, K.H., He, X., Zhang, L., Yang, L.: Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–5456 (2018)

[16]

Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Learning to learn from noisy labeled data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5051–5059 (2019)

[17]

Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp. 472–479. ACM, New York (2005).

[18]

Muhlenbach F, Lallich S, and Zighed DA Identifying and handling mislabelled instances J. Intell. Inf. Syst. 2004 22 1 89-109

[19]

Nanni L and Franco A Reduced reward-punishment editing for building ensembles of classifiers Exp. Syst. Appl. 2011 38 3 2395-2400

[20]

Patrini, G., Rozza, A., Krishna Menon, A., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952 (2017)

[21]

Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C., et al.: Estimating the support of a high-dimensional distribution (1999)

[22]

Pouyanfar S et al. A survey on deep learning: algorithms, techniques, and applications ACM Comput. Surv. (CSUR) 2019 51 5 92

[23]

Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A.: Training deep neural networks on noisy labels with bootstrapping. In: International Conference on Learning Representations (2015)

[24]

Ross, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky, N., Wojatzki, M.: Measuring the reliability of hate speech annotations: the case of the European refugee crisis. arXiv preprint arXiv:1701.08118 (2017)

[25]

Tanaka, D., Ikami, D., Yamasaki, T., Aizawa, K.: Joint optimization framework for learning with noisy labels. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5552–5560. IEEE (2018)

[26]

Thomee, B., et al.: Yfcc100m: The new data in multimedia research. arXiv preprint arXiv:1503.01817 (2015)

[27]

Thongkam J, Xu G, Zhang Y, Huang F, et al. Ishikawa Y et al. Support vector machine for outlier detection in breast cancer survivability prediction Advanced Web and Network Technologies, and Applications 2008 Heidelberg Springer 99-109

[28]

Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 839–847 (2017)

[29]

Vondrick C, Patterson D, and Ramanan D Efficiently scaling up crowdsourced video annotation Int. J. Comput. Vis. 2013 101 1 184-204

[30]

Waseem, Z.: Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142 (2016)

[31]

Wilson DL Asymptotic properties of nearest neighbor rules using edited data IEEE Trans. Syst. Man Cybern. 1972 3 408-421

[32]

Xia, X., et al.: Are anchor points really indispensable in label-noise learning? In: NeurIPS (2019)

[33]

Xia, Y., Cao, X., Wen, F., Hua, G., Sun, J.: Learning discriminative reconstructions for unsupervised outlier removal. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1519 (2015)

[34]

Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2691–2699 (2015)

[35]

Xiaojin, Z., Zoubin, G.: Learning from labeled and unlabeled data with label propagation. Technical Report, Technical Report CMU-CALD-02-107, Carnegie Mellon University (2002)

[36]

Yalniz, I.Z., Jégou, H., Chen, K., Paluri, M., Mahajan, D.: Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019)

[37]

Yalniz, I.Z., Manmatha, R.: Dependence models for searching text in document images. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 49–63 (2019).

[38]

Yi, K., Wu, J.: Probabilistic end-to-end noise correction for learning with noisy labels. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2019)

[39]

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)

[40]

Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp. 321–328 (2004)

Cited By

Zhang YQin JWang YAli MJi YMao R(2023)TabMentor: Detect Errors on Tabular Data with Noisy LabelsAdvanced Data Mining and Applications10.1007/978-3-031-46671-7_12(167-182)Online publication date: 27-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-46671-7_12

Index Terms

NoiseRank: Unsupervised Label Noise Reduction with Dependence Models

Index terms have been assigned to the content through auto-classification.

Recommendations

Noisy multi-label semi-supervised dimensionality reduction
Highlights
- A new semi-supervised and label noise-tolerant multi-label dimensionality reduction method.
Abstract
Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been ...
Analysis of label noise in graph-based semi-supervised learning
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. However, the labeling process can be tedious, long, costly, and error-prone. It is often the case that most of our data is unlabeled. ...
Label distribution‐based noise correction for multiclass crowdsourcing
Abstract
In crowdsourcing scenarios, we can often obtain each instance's multiple noisy labels from different crowd workers and then use a label integration method to infer its integrated label. In spite of the effectiveness of label integration methods, a ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII

Aug 2020

828 pages

ISBN:978-3-030-58582-2

DOI:10.1007/978-3-030-58583-9

Editors:
Andrea Vedaldi
University of Oxford, Oxford, UK
,
Horst Bischof
Graz University of Technology, Graz, Austria
,
Thomas Brox
University of Freiburg, Freiburg im Breisgau, Germany
,
Jan-Michael Frahm
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

© Springer Nature Switzerland AG 2020.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2020

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YQin JWang YAli MJi YMao R(2023)TabMentor: Detect Errors on Tabular Data with Noisy LabelsAdvanced Data Mining and Applications10.1007/978-3-031-46671-7_12(167-182)Online publication date: 27-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-46671-7_12

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents