Article

Rethinking Confidence Calibration for Failure Prediction

Authors:

Cheng-Lin LiuAuthors Info & Claims

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXV

Pages 518 - 536

https://doi.org/10.1007/978-3-031-19806-9_30

Published: 23 October 2022 Publication History

Abstract

Reliable confidence estimation for the predictions is important in many safety-critical applications. However, modern deep neural networks are often overconfident for their incorrect predictions. Recently, many calibration methods have been proposed to alleviate the overconfidence problem. With calibrated confidence, a primary and practical purpose is to detect misclassification errors by filtering out low-confidence predictions (known as failure prediction). In this paper, we find a general, widely-existed but actually-neglected phenomenon that most confidence calibration methods are useless or harmful for failure prediction. We investigate this problem and reveal that popular confidence calibration methods often lead to worse confidence separation between correct and incorrect samples, making it more difficult to decide whether to trust a prediction or not. Finally, inspired by the natural connection between flat minima and confidence separation, we propose a simple hypothesis: flat minima is beneficial for failure prediction. We verify this hypothesis via extensive experiments and further boost the performance by combining two different flat minima techniques. Our code is available at https://github.com/Impression2805/FMFP.

References

[1]

Achille, A., Soatto, S.: Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 19, 50:1–50:34 (2018)

[2]

Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)

[3]

Brier GW et al. Verification of forecasts expressed in terms of probability Mon. Weather Rev. 1950 78 1 1-3

[4]

Cha, J., et al.: SWAD: domain generalization by seeking flat minima. In: NeurIPS (2021)

[5]

Chaudhari P et al. Entropy-SGD: biasing gradient descent into wide valleys J. Stat. Mech. Theory Exp. 2019 2019 12

[6]

Chen, T., Zhang, Z., Liu, S., Chang, S., Wang, Z.: Robust overfitting may be mitigated by properly learned smoothening. In: ICLR (2021)

[7]

Corbière, C., Thome, N., Bar-Hen, A., Cord, M., Pérez, P.: Addressing failure prediction by learning model confidence. In: NeurIPS, pp. 2898–2909 (2019)

[8]

Corbière C, Thome N, Saporta A, Vu TH, Cord M, and Perez P Confidence estimation via auxiliary models IEEE Trans. Pattern Anal. Mach. Intell. 2021 44 6043-6055

[9]

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

[10]

Dosovitskiy, A., et al.: An image is worth

16 \times 16

words: transformers for image recognition at scale. In: ICLR (2020)

[11]

Esteva A et al. Dermatologist-level classification of skin cancer with deep neural networks Nature 2017 542 7639 115-118

[12]

Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. In: ICLR (2020)

[13]

Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, vol. 48, pp. 1050–1059 (2016)

[14]

Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. In: NeurIPS, pp. 4878–4887 (2017)

[15]

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330 (2017)

[16]

Gupta, K., Rahimi, A., Ajanthan, T., Mensink, T., Sminchisescu, C., Hartley, R.: Calibration of neural networks using splines. In: ICLR (2020)

[17]

Havasi, M., et al.: Training independent subnetworks for robust prediction. In: ICLR (2020)

[18]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

[19]

He K, Zhang X, Ren S, and Sun J Leibe B, Matas J, Sebe N, and Welling M Identity mappings in deep residual networks Computer Vision – ECCV 2016 2016 Cham Springer 630-645

[20]

Hebbalaguppe, R., Prakash, J., Madan, N., Arora, C.: A stitch in time saves nine: a train-time regularizing loss for improved neural network calibration. In: CVPR, pp. 16081–16090, June 2022

[21]

Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR (2019)

[22]

Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: ICLR (2017)

[23]

Hendrycks, D., Mazeika, M., Dietterich, T.G.: Deep anomaly detection with outlier exposure. In: ICLR (2019)

[24]

Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2020)

[25]

Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

[26]

Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017)

[27]

Huang, W.R., et al.: Understanding generalization through visualizations. In: “I Can’t Believe It’s Not Better!” NeurIPS 2020 Workshop (2020)

[28]

Izmailov, P., Wilson, A., Podoprikhin, D., Vetrov, D., Garipov, T.: Averaging weights leads to wider optima and better generalization. In: UAI, pp. 876–885 (2018)

[29]

Janai, J., Güney, F., Behl, A., Geiger, A., et al.: Computer vision for autonomous vehicles: Problems, datasets and state of the art. Found. Trends® Comput. Graph. Vis. 12(1–3), 1–308 (2020)

[30]

Jiang, H., Kim, B., Gupta, M.R.: To trust or not to trust a classifier. In: NeurIPS (2018)

[31]

Joo, T., Chung, U.: Revisiting explicit regularization in neural networks for well-calibrated predictive uncertainty. arXiv preprint arXiv:2006.06399 (2020)

[32]

Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NeurIPS, pp. 5574–5584 (2017)

[33]

Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)

[34]

Kull, M., Perelló-Nieto, M., Kängsepp, M., de Menezes e Silva Filho, T., Song, H., Flach, P.A.: Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration. In: NeurIPS, pp. 12295–12305 (2019)

[35]

Kull, M., de Menezes e Silva Filho, T., Flach, P.A.: Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In: AISTATS, pp. 623–631 (2017)

[36]

Kumar, A., Liang, P.S., Ma, T.: Verified uncertainty calibration. In: NeurIPS (2019)

[37]

Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: NeurIPS, pp. 7167–7177 (2018)

[38]

Leidner, D., Borst, C., Dietrich, A., Beetz, M., Albu-Schäffer, A.: Classifying compliant manipulation tasks for automated planning in robotics. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1769–1776 (2015)

[39]

Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: ICLR (2018)

[40]

Lin T, Goyal P, Girshick RB, He K, and Dollár P Focal loss for dense object detection IEEE Trans. Pattern Anal. Mach. Intell. 2020 42 318-327

[41]

Liu, B., Ben Ayed, I., Galdran, A., Dolz, J.: The devil is in the margin: margin-based label smoothing for network calibration. In: CVPR, pp. 80–88, June 2022

[42]

Luo, Y., Wong, Y., Kankanhalli, M.S., Zhao, Q.: Learning to predict trustworthiness with steep slope loss. NeurIPS (2021)

[43]

Minderer, M., et al.: Revisiting the calibration of modern neural networks. In: NeurIPS (2021)

[44]

Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016)

[45]

Moon, J., Kim, J., Shin, Y., Hwang, S.: Confidence-aware learning for deep neural networks. In: ICML, pp. 7034–7044 (2020)

[46]

Mozafari, A.S., Gomes, H.S., Leão, W., Gagné, C.: Unsupervised temperature scaling: an unsupervised post-processing calibration method of deep networks. ar

Xiv

: Computer Vision and Pattern Recognition (2019)

[47]

Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P.H.S., Dokania, P.K.: Calibrating deep neural networks using focal loss. In: NeurIPS (2020)

[48]

Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? In: NeurIPS, pp. 4696–4705 (2019)

[49]

Murphy KP Probabilistic Machine Learning: An introduction 2022 Cambridge MIT Press probml.ai

[50]

Naeini, M.P., Cooper, G.F., Hauskrecht, M.: Obtaining well calibrated probabilities using Bayesian binning. In: AAAI, pp. 2901–2907 (2015)

[51]

Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: CVPR Workshops, vol. 2 (2019)

[52]

Ovadia, Y., et al.: Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In: NeurIPS (2019)

[53]

Patel, K., Beluch, W.H., Yang, B., Pfeiffer, M., Zhang, D.: Multi-class uncertainty calibration via mutual information maximization-based binning. In: ICLR (2020)

[54]

Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017)

[55]

Pittorino, F., et al.: Entropic gradient descent algorithms and wide flat minima. J. Stat. Mech. Theory Exp. 2021(12), 124015 (2021)

[56]

Rahimi, A., Shaban, A., Cheng, C., Hartley, R., Boots, B.: Intra order-preserving functions for calibration of multi-class neural networks. In: NeurIPS (2020)

[57]

Rice, L., Wong, E., Kolter, Z.: Overfitting in adversarially robust deep learning. In: ICML, pp. 8093–8104 (2020)

[58]

Shehzad, M.N., et al.: Threshold temperature scaling: Heuristic to address temperature and power issues in MPSoCs. Microprocess. Microsyst. 77, 103124 (2020)

[59]

Shen, Z., Liu, Z., Xu, D., Chen, Z., Cheng, K.T., Savvides, M.: Is label smoothing truly incompatible with knowledge distillation: an empirical study. In: ICLR (2020)

[60]

Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)

[61]

Thulasidasan, S., Chennupati, G., Bilmes, J., Bhattacharya, T., Michalak, S.: On mixup training: Improved calibration and predictive uncertainty for deep neural networks. In: NeurIPS, pp. 13888–13899 (2019)

[62]

Tolstikhin, I.O., et al.: MLP-mixer: an all-MLP architecture for vision. In: NeurIPS (2021)

[63]

Trockman, A., Kolter, J.Z.: Patches are all you need? arXiv preprint arXiv:2201.09792 (2022)

[64]

Vaicenavicius, J., Widmann, D., Andersson, C., Lindsten, F., Roll, J., Schön, T.: Evaluating model calibration in classification. In: AISTATS, pp. 3459–3467 (2019)

[65]

Wang, D., Feng, L., Zhang, M.: Rethinking calibration of deep neural networks: do not be afraid of overconfidence. In: NeurIPS (2021)

[66]

Wen, Y., et al.: Combining ensembles and data augmentation can harm your calibration. In: ICLR (2020)

[67]

Wu, D., Xia, S., Wang, Y.: Adversarial weight perturbation helps robust generalization. In: NeurIPS (2020)

[68]

Xing, C., Arik, S.Ö., Zhang, Z., Pfister, T.: Distance-based learning from errors for confidence calibration. In: ICLR (2020)

[69]

Yao, L., Miller, J.: Tiny ImageNet classification with convolutional neural networks. CS 231N

[70]

Yun, S., Park, J., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: CVPR, pp. 13873–13882 (2020)

[71]

Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)

[72]

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: ICLR (2018)

[73]

Zhang, L., Deng, Z., Kawaguchi, K., Zou, J.: When and how mixup improves calibration. In: ICML, pp. 26135–26160 (2022)

[74]

Zhang, W., Vaidya, I.: Mixup training leads to reduced overfitting and improved calibration for the transformer architecture. CoRR (2021)

[75]

Zhong, Z., Cui, J., Liu, S., Jia, J.: Improving calibration for long-tailed recognition. In: CVPR, pp. 16489–16498 (2021)

Cited By

Yang HWang MWang QLao MZhou YBaeza-Yates RBonchi F(2024)Balanced Confidence Calibration for Graph Neural NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671741(3747-3757)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671741
Yang RZhou JLu XShen JChen HChen MHe YLiu F(2024)A robust rice yield estimation framework developed by grading modeling and normalized weight decision-making strategy using UAV imaging technologyComputers and Electronics in Agriculture10.1016/j.compag.2023.108417215:COnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.compag.2023.108417
Liu YChen YYin FLiu C(2024)Context-Aware Confidence Estimation for Rejection in Handwritten Chinese Text RecognitionDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70533-5_9(134-151)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-70533-5_9

Index Terms

Rethinking Confidence Calibration for Failure Prediction

Index terms have been assigned to the content through auto-classification.

Recommendations

On confidence computation and calibration of deep support vector data description
Abstract
Deep support vector data description (DeSVDD) is an emerging anomaly detection method based on the deep learning methodology. However, few studies take the confidence of DeSVDD predictions into account so that the present DeSVDD models ...
Model Confidence Calibration for Reliable COVID-19 Early Screening via Audio Signal Analysis
BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Advanced sensors in mobile devices have served as an effective screening tool for COVID-19 diagnosis, and an alternative to reverse transcription-polymerase chain reaction (rRT-PCR) tests, particularly in underdeveloped countries. In this study, we ...
Integrating confidence calibration and adversarial robustness via adversarial calibration entropy
Abstract
The vulnerability of deep neural networks to adversarial samples poses significant security concerns. Previous empirical analyses have shown that increasing adversarial robustness through adversarial training leads to models making unconfident ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXV

Oct 2022

814 pages

ISBN:978-3-031-19805-2

DOI:10.1007/978-3-031-19806-9

Editors:
Shai Avidan
Tel Aviv University, Tel Aviv, Israel
,
Gabriel Brostow
University College London, London, UK
,
Moustapha Cissé
Google AI, Accra, Ghana
,
Giovanni Maria Farinella
University of Catania, Catania, Italy
,
Tal Hassner
Facebook (United States), Menlo Park, CA, USA

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang HWang MWang QLao MZhou YBaeza-Yates RBonchi F(2024)Balanced Confidence Calibration for Graph Neural NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671741(3747-3757)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671741
Yang RZhou JLu XShen JChen HChen MHe YLiu F(2024)A robust rice yield estimation framework developed by grading modeling and normalized weight decision-making strategy using UAV imaging technologyComputers and Electronics in Agriculture10.1016/j.compag.2023.108417215:COnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.compag.2023.108417
Liu YChen YYin FLiu C(2024)Context-Aware Confidence Estimation for Rejection in Handwritten Chinese Text RecognitionDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70533-5_9(134-151)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-70533-5_9
Ao SRueger SSiddharthan AEvans RShpitser I(2023)Two sides of miscalibrationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625842(77-87)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625842

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents