Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-19806-9_30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Rethinking Confidence Calibration for Failure Prediction

Published: 23 October 2022 Publication History

Abstract

Reliable confidence estimation for the predictions is important in many safety-critical applications. However, modern deep neural networks are often overconfident for their incorrect predictions. Recently, many calibration methods have been proposed to alleviate the overconfidence problem. With calibrated confidence, a primary and practical purpose is to detect misclassification errors by filtering out low-confidence predictions (known as failure prediction). In this paper, we find a general, widely-existed but actually-neglected phenomenon that most confidence calibration methods are useless or harmful for failure prediction. We investigate this problem and reveal that popular confidence calibration methods often lead to worse confidence separation between correct and incorrect samples, making it more difficult to decide whether to trust a prediction or not. Finally, inspired by the natural connection between flat minima and confidence separation, we propose a simple hypothesis: flat minima is beneficial for failure prediction. We verify this hypothesis via extensive experiments and further boost the performance by combining two different flat minima techniques. Our code is available at https://github.com/Impression2805/FMFP.

References

[1]
Achille, A., Soatto, S.: Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 19, 50:1–50:34 (2018)
[2]
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
[3]
Brier GW et al. Verification of forecasts expressed in terms of probability Mon. Weather Rev. 1950 78 1 1-3
[4]
Cha, J., et al.: SWAD: domain generalization by seeking flat minima. In: NeurIPS (2021)
[5]
Chaudhari P et al. Entropy-SGD: biasing gradient descent into wide valleys J. Stat. Mech. Theory Exp. 2019 2019 12
[6]
Chen, T., Zhang, Z., Liu, S., Chang, S., Wang, Z.: Robust overfitting may be mitigated by properly learned smoothening. In: ICLR (2021)
[7]
Corbière, C., Thome, N., Bar-Hen, A., Cord, M., Pérez, P.: Addressing failure prediction by learning model confidence. In: NeurIPS, pp. 2898–2909 (2019)
[8]
Corbière C, Thome N, Saporta A, Vu TH, Cord M, and Perez P Confidence estimation via auxiliary models IEEE Trans. Pattern Anal. Mach. Intell. 2021 44 6043-6055
[9]
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
[10]
Dosovitskiy, A., et al.: An image is worth 16×16 words: transformers for image recognition at scale. In: ICLR (2020)
[11]
Esteva A et al. Dermatologist-level classification of skin cancer with deep neural networks Nature 2017 542 7639 115-118
[12]
Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. In: ICLR (2020)
[13]
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, vol. 48, pp. 1050–1059 (2016)
[14]
Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. In: NeurIPS, pp. 4878–4887 (2017)
[15]
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330 (2017)
[16]
Gupta, K., Rahimi, A., Ajanthan, T., Mensink, T., Sminchisescu, C., Hartley, R.: Calibration of neural networks using splines. In: ICLR (2020)
[17]
Havasi, M., et al.: Training independent subnetworks for robust prediction. In: ICLR (2020)
[18]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
[19]
He K, Zhang X, Ren S, and Sun J Leibe B, Matas J, Sebe N, and Welling M Identity mappings in deep residual networks Computer Vision – ECCV 2016 2016 Cham Springer 630-645
[20]
Hebbalaguppe, R., Prakash, J., Madan, N., Arora, C.: A stitch in time saves nine: a train-time regularizing loss for improved neural network calibration. In: CVPR, pp. 16081–16090, June 2022
[21]
Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR (2019)
[22]
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: ICLR (2017)
[23]
Hendrycks, D., Mazeika, M., Dietterich, T.G.: Deep anomaly detection with outlier exposure. In: ICLR (2019)
[24]
Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2020)
[25]
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
[26]
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017)
[27]
Huang, W.R., et al.: Understanding generalization through visualizations. In: “I Can’t Believe It’s Not Better!” NeurIPS 2020 Workshop (2020)
[28]
Izmailov, P., Wilson, A., Podoprikhin, D., Vetrov, D., Garipov, T.: Averaging weights leads to wider optima and better generalization. In: UAI, pp. 876–885 (2018)
[29]
Janai, J., Güney, F., Behl, A., Geiger, A., et al.: Computer vision for autonomous vehicles: Problems, datasets and state of the art. Found. Trends® Comput. Graph. Vis. 12(1–3), 1–308 (2020)
[30]
Jiang, H., Kim, B., Gupta, M.R.: To trust or not to trust a classifier. In: NeurIPS (2018)
[31]
Joo, T., Chung, U.: Revisiting explicit regularization in neural networks for well-calibrated predictive uncertainty. arXiv preprint arXiv:2006.06399 (2020)
[32]
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NeurIPS, pp. 5574–5584 (2017)
[33]
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
[34]
Kull, M., Perelló-Nieto, M., Kängsepp, M., de Menezes e Silva Filho, T., Song, H., Flach, P.A.: Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration. In: NeurIPS, pp. 12295–12305 (2019)
[35]
Kull, M., de Menezes e Silva Filho, T., Flach, P.A.: Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In: AISTATS, pp. 623–631 (2017)
[36]
Kumar, A., Liang, P.S., Ma, T.: Verified uncertainty calibration. In: NeurIPS (2019)
[37]
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: NeurIPS, pp. 7167–7177 (2018)
[38]
Leidner, D., Borst, C., Dietrich, A., Beetz, M., Albu-Schäffer, A.: Classifying compliant manipulation tasks for automated planning in robotics. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1769–1776 (2015)
[39]
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: ICLR (2018)
[40]
Lin T, Goyal P, Girshick RB, He K, and Dollár P Focal loss for dense object detection IEEE Trans. Pattern Anal. Mach. Intell. 2020 42 318-327
[41]
Liu, B., Ben Ayed, I., Galdran, A., Dolz, J.: The devil is in the margin: margin-based label smoothing for network calibration. In: CVPR, pp. 80–88, June 2022
[42]
Luo, Y., Wong, Y., Kankanhalli, M.S., Zhao, Q.: Learning to predict trustworthiness with steep slope loss. NeurIPS (2021)
[43]
Minderer, M., et al.: Revisiting the calibration of modern neural networks. In: NeurIPS (2021)
[44]
Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016)
[45]
Moon, J., Kim, J., Shin, Y., Hwang, S.: Confidence-aware learning for deep neural networks. In: ICML, pp. 7034–7044 (2020)
[46]
Mozafari, A.S., Gomes, H.S., Leão, W., Gagné, C.: Unsupervised temperature scaling: an unsupervised post-processing calibration method of deep networks. arXiv: Computer Vision and Pattern Recognition (2019)
[47]
Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P.H.S., Dokania, P.K.: Calibrating deep neural networks using focal loss. In: NeurIPS (2020)
[48]
Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? In: NeurIPS, pp. 4696–4705 (2019)
[49]
Murphy KP Probabilistic Machine Learning: An introduction 2022 Cambridge MIT Press probml.ai
[50]
Naeini, M.P., Cooper, G.F., Hauskrecht, M.: Obtaining well calibrated probabilities using Bayesian binning. In: AAAI, pp. 2901–2907 (2015)
[51]
Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: CVPR Workshops, vol. 2 (2019)
[52]
Ovadia, Y., et al.: Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In: NeurIPS (2019)
[53]
Patel, K., Beluch, W.H., Yang, B., Pfeiffer, M., Zhang, D.: Multi-class uncertainty calibration via mutual information maximization-based binning. In: ICLR (2020)
[54]
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017)
[55]
Pittorino, F., et al.: Entropic gradient descent algorithms and wide flat minima. J. Stat. Mech. Theory Exp. 2021(12), 124015 (2021)
[56]
Rahimi, A., Shaban, A., Cheng, C., Hartley, R., Boots, B.: Intra order-preserving functions for calibration of multi-class neural networks. In: NeurIPS (2020)
[57]
Rice, L., Wong, E., Kolter, Z.: Overfitting in adversarially robust deep learning. In: ICML, pp. 8093–8104 (2020)
[58]
Shehzad, M.N., et al.: Threshold temperature scaling: Heuristic to address temperature and power issues in MPSoCs. Microprocess. Microsyst. 77, 103124 (2020)
[59]
Shen, Z., Liu, Z., Xu, D., Chen, Z., Cheng, K.T., Savvides, M.: Is label smoothing truly incompatible with knowledge distillation: an empirical study. In: ICLR (2020)
[60]
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)
[61]
Thulasidasan, S., Chennupati, G., Bilmes, J., Bhattacharya, T., Michalak, S.: On mixup training: Improved calibration and predictive uncertainty for deep neural networks. In: NeurIPS, pp. 13888–13899 (2019)
[62]
Tolstikhin, I.O., et al.: MLP-mixer: an all-MLP architecture for vision. In: NeurIPS (2021)
[63]
Trockman, A., Kolter, J.Z.: Patches are all you need? arXiv preprint arXiv:2201.09792 (2022)
[64]
Vaicenavicius, J., Widmann, D., Andersson, C., Lindsten, F., Roll, J., Schön, T.: Evaluating model calibration in classification. In: AISTATS, pp. 3459–3467 (2019)
[65]
Wang, D., Feng, L., Zhang, M.: Rethinking calibration of deep neural networks: do not be afraid of overconfidence. In: NeurIPS (2021)
[66]
Wen, Y., et al.: Combining ensembles and data augmentation can harm your calibration. In: ICLR (2020)
[67]
Wu, D., Xia, S., Wang, Y.: Adversarial weight perturbation helps robust generalization. In: NeurIPS (2020)
[68]
Xing, C., Arik, S.Ö., Zhang, Z., Pfister, T.: Distance-based learning from errors for confidence calibration. In: ICLR (2020)
[69]
Yao, L., Miller, J.: Tiny ImageNet classification with convolutional neural networks. CS 231N
[70]
Yun, S., Park, J., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: CVPR, pp. 13873–13882 (2020)
[71]
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
[72]
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: ICLR (2018)
[73]
Zhang, L., Deng, Z., Kawaguchi, K., Zou, J.: When and how mixup improves calibration. In: ICML, pp. 26135–26160 (2022)
[74]
Zhang, W., Vaidya, I.: Mixup training leads to reduced overfitting and improved calibration for the transformer architecture. CoRR (2021)
[75]
Zhong, Z., Cui, J., Liu, S., Jia, J.: Improving calibration for long-tailed recognition. In: CVPR, pp. 16489–16498 (2021)

Cited By

View all
  • (2024)Balanced Confidence Calibration for Graph Neural NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671741(3747-3757)Online publication date: 25-Aug-2024
  • (2024)A robust rice yield estimation framework developed by grading modeling and normalized weight decision-making strategy using UAV imaging technologyComputers and Electronics in Agriculture10.1016/j.compag.2023.108417215:COnline publication date: 27-Feb-2024
  • (2024)Context-Aware Confidence Estimation for Rejection in Handwritten Chinese Text RecognitionDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70533-5_9(134-151)Online publication date: 30-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXV
Oct 2022
814 pages
ISBN:978-3-031-19805-2
DOI:10.1007/978-3-031-19806-9

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

  1. Failure prediction
  2. Confidence calibration
  3. Flat minima
  4. Uncertainty
  5. Misclassification detection
  6. Selective classification

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Balanced Confidence Calibration for Graph Neural NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671741(3747-3757)Online publication date: 25-Aug-2024
  • (2024)A robust rice yield estimation framework developed by grading modeling and normalized weight decision-making strategy using UAV imaging technologyComputers and Electronics in Agriculture10.1016/j.compag.2023.108417215:COnline publication date: 27-Feb-2024
  • (2024)Context-Aware Confidence Estimation for Rejection in Handwritten Chinese Text RecognitionDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70533-5_9(134-151)Online publication date: 30-Aug-2024
  • (2023)Two sides of miscalibrationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625842(77-87)Online publication date: 31-Jul-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media