research-article

Towards Test Time Adaptation via Calibrated Entropy Minimization

Authors:

Yun ZhouAuthors Info & Claims

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3736 - 3746

https://doi.org/10.1145/3637528.3671672

Published: 24 August 2024 Publication History

Abstract

Robust models must demonstrate strong generalizability, even amid environmental changes. However, the complex variability and noise in real-world data often lead to a pronounced performance gap between the training and testing phases. Researchers have recently introduced test-time-domain adaptation (TTA) to address this challenge. TTA methods primarily adapt source-pretrained models to a target domain using only unlabeled test data. This study found that existing TTA methods consider only the largest logit as a pseudo-label and aim to minimize the entropy of test time predictions. This maximizes the predictive confidence of the model. However, this corresponds to the model being overconfident in the local test scenarios. In response, we introduce a novel confidence-calibration loss function called Calibrated Entropy Test-Time Adaptation (CETA), which considers the model's largest logit and the next-highest-ranked one, aiming to strike a balance between overconfidence and underconfidence. This was achieved by incorporating a sample-wise regularization term. We also provide a theoretical foundation for the proposed loss function. Experimentally, our method outperformed existing strategies on benchmark corruption datasets across multiple models, underscoring the efficacy of our approach.

References

[1]

Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. 2006. Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101, 473 (2006), 138--156.

[2]

Keith Beven and Andrew Binley. 1992. The future of distributed models: model calibration and uncertainty prediction. Hydrological processes 6, 3 (1992), 279--298.

[3]

Nontawat Charoenphakdee, Jayakorn Vongkulbhisal, Nuttapong Chairatanakul, and Masashi Sugiyama. 2021. On focal loss for class-posterior probability estimation: A theoretical perspective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5202--5211.

[4]

Dian Chen, DequanWang, Trevor Darrell, and Sayna Ebrahimi. 2022. Contrastive test-time adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 295--305.

[5]

Dong Chen, Hongqing Zhu, and Suyi Yang. 2023. UC-SFDA: Source-free domain adaptation via uncertainty prediction and evidence-based contrastive learning. Knowledge-Based Systems (2023), 110728.

[6]

Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. 2021. Robust Bench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 1--29.

[7]

Marcus de Carvalho, Mahardhika Pratama, Jie Zhang, and Edward Yapp Kien Yee. 2022. ACDC: Online unsupervised cross-domain adaptation. Knowledge-Based Systems 253 (2022), 109486.

Digital Library

[8]

Cian Eastwood, Ian Mason, Christopher KI Williams, and Bernhard Schölkopf. 2021. Source-free adaptation to measurement shift via bottom-up feature restoration. arXiv preprint arXiv:2107.05446 (2021).

[9]

Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, et al. 2023. A survey of uncertainty in deep neural networks. Artificial Intelligence Review 56, Suppl 1 (2023), 1513--1589.

Digital Library

[10]

Arindam Ghosh, Thomas Schaaf, and Matthew Gormley. 2022. AdaFocal: Calibration-aware Adaptive Focal Loss. In Advances in Neural Information Processing Systems. 1583--1595.

[11]

Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. 2022. NOTE: Robust continual test-time adaptation against temporal correlation. In Advances in Neural Information Processing Systems. 27253--27266.

[12]

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In International Conference on Machine Learning. 1321--1330.

[13]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.

[14]

Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019).

[15]

Dan Hendrycks and Thomas G Dietterich. 2018. Benchmarking neural network robustness to common corruptions and surface variations. arXiv preprint arXiv:1807.01697 (2018).

[16]

Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. In International Conference on Learning Representations. 1--15.

[17]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. stat 1050 (2015), 9.

[18]

Xuefeng Hu, Gokhan Uzunbas, Sirius Chen, Rui Wang, Ashish Shah, Ram Nevatia, and Ser-Nam Lim. 2021. Mixnorm: Test-time adaptation through online normalization estimation. arXiv preprint arXiv:2110.11478 (2021).

[19]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501--1510.

[20]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. 448--456.

Digital Library

[21]

Ansh Khurana, Sujoy Paul, Piyush Rai, Soma Biswas, and Gaurav Aggarwal. 2021. Sita: Single image test-time adaptation. arXiv preprint arXiv:2112.02355 (2021).

[22]

Klim Kireev, Maksym Andriushchenko, and Nicolas Flammarion. 2022. On the effectiveness of adversarial training against common corruptions. In Uncertainty in Artificial Intelligence. 1012--1021.

[23]

Wouter M Kouw and Marco Loog. 2019. A review of domain adaptation without target labels. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 766--785.

[24]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).

[25]

Jogendra Nath Kundu, Akshay Kulkarni, Amit Singh, Varun Jampani, and R Venkatesh Babu. 2021. Generalize then adapt: Source-free domain adaptive semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 7046--7056.

[26]

Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. 2016. Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779 (2016).

[27]

Jian Liang, Ran He, and Tieniu Tan. 2023. A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361 (2023).

[28]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.

[29]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980--2988.

[30]

Yuang Liu, Wei Zhang, and Jun Wang. 2021. Source-free domain adaptation for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1215--1224.

[31]

Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2016. Unsupervised domain adaptation with residual transfer networks. Advances in neural information processing systems 29 (2016).

[32]

Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. 2013. Domain generalization via invariant feature representation. In International Conference on Machine Learning. 10--18.

[33]

Rafael Müller, Simon Kornblith, and Geoffrey Hinton. 2019. When does label smoothing help?. In Advances in Neural Information Processing Systems. 4694--4703.

[34]

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. 2023. Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations. 1--27.

[35]

Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey Hinton. 2017. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017).

[36]

Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2008. Dataset shift in machine learning. Mit Press.

[37]

Inkyu Shin, Yi-Hsuan Tsai, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Sparsh Garg, In So Kweon, and Kuk-Jin Yoon. 2022. MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 16928--16937.

[38]

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. 2020. Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning. 9229--9248.

[39]

Linwei Tao, Minjing Dong, and Chang Xu. 2023. Dual Focal Loss for Calibration. In International Conference on Machine Learning. 33833--33849.

[40]

Ambuj Tewari and Peter L Bartlett. 2007. On the Consistency of Multiclass Classification Methods. Journal of Machine Learning Research 8, 5 (2007).

[41]

Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. 2015. Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE international conference on computer vision. 4068--4076.

Digital Library

[42]

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016).

[43]

Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, and Thomas Schön. 2019. Evaluating model calibration in classification. In The 22nd International Conference on Artificial Intelligence and Statistics. 3459-- 3467.

[44]

Dequan Wang, An Ju, Evan Shelhamer, David Wagner, and Trevor Darrell. 2021. Fighting gradients with gradients: Dynamic defenses against adversarial attacks. arXiv preprint arXiv:2105.08714 (2021).

[45]

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. 2020. Tent: Fully test-time adaptation by entropy minimization. arXiv preprint arXiv:2006.10726 (2020).

[46]

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. 2021. Tent: Fully Test-Time Adaptation by Entropy Minimization. In International Conference on Learning Representations. 1--15.

[47]

Min Wang, Hao Yang, and Qing Cheng. 2022. GCL: Graph Calibration Loss for Trustworthy Graph Neural Network. In Proceedings of the 30th ACM International Conference on Multimedia. 988--996.

Digital Library

[48]

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. 2022. Continual test-time domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7201--7211.

[49]

Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, and Yixuan Li. 2022. Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning. 23631--23644.

[50]

Tao Yang, Shenglong Zhou, Yuwang Wang, Yan Lu, and Nanning Zheng. 2022. Test-time batch normalization. arXiv preprint arXiv:2205.10210 (2022).

[51]

Fuming You, Jingjing Li, and Zhou Zhao. 2021. Test-time batch statistics calibration for covariate shift. arXiv preprint arXiv:2110.04065 (2021).

[52]

Bianca Zadrozny and Charles Elkan. 2001. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In International Conference on Machine Learning. 609--616.

[53]

Hang Zhang, Weike Liu, Hao Yang, Yun Zhou, Cheng Zhu, and Weiming Zhang. 2023. CSAL: Cost sensitive active learning for multi-source drifting stream. Knowledge-Based Systems 277 (2023), 110771.

Digital Library

[54]

Marvin Zhang, Sergey Levine, and Chelsea Finn. 2021. Memo: Test time robustness via adaptation and augmentation. arXiv preprint arXiv:2110.09506 (2021).

[55]

Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, and Lei Zhang. 2022. Exact feature distribution matching for arbitrary style transfer and domain generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8035--8045.

[56]

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. 2022. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

[57]

Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. 2021. Domain generalization with mixstyle. In International Conference on Learning Representations. 1--15.

[58]

Zhi Zhou, Lan-Zhe Guo, Lin-Han Jia, Dingchu Zhang, and Yu-Feng Li. 2023. ODS: test-time adaptation in the presence of open-world data shift. In International Conference on Machine Learning. 42574--42588.

Index Terms

Towards Test Time Adaptation via Calibrated Entropy Minimization

Index terms have been assigned to the content through auto-classification.

Recommendations

Test-Time Adaptation with Shape Moments for Image Segmentation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022
Abstract
Supervised learning is well-known to fail at generalization under distribution shifts. In typical clinical settings, the source data is inaccessible and the target distribution is represented with a handful of samples: adaptation can only happen ...
Multi-source fully test-time adaptation
Abstract
Deep neural networks have significantly advanced various fields. However, these models often encounter difficulties in achieving effective generalization when the distribution of test samples varies from that of the training samples. Recently, ...
Highlights
- New direction: propose the setting of multi-source fully test-time adaptation for the first time.
- Novel method: propose an effective baseline MUTE via weighted aggregation schemes for adaptation.
- Extensive evaluation: achieve ...
Partial Label Learning by Entropy Minimization
Advances in Artificial Intelligence
Abstract
Partial label learning deals with the problem where each training example is associated with a set of candidate labels, only one of which is assumed to be valid. To learn from such ambiguous labeling information, the critical point is to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2024

6901 pages

ISBN:9798400704901

DOI:10.1145/3637528

General Chairs:
Ricardo Baeza-Yates
Northeastern University, USA
,
Francesco Bonchi
CENTAI / Eurecat, Italy

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Science and Technology Innovation Program of Hunan Province
Training Program for Excellent Young Innovators of Changsha
National Natural Science Foundation of China

Conference

KDD '24

Sponsor:

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
194
Total Downloads

Downloads (Last 12 months)194
Downloads (Last 6 weeks)20

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten