Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3637528.3671672acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Towards Test Time Adaptation via Calibrated Entropy Minimization

Published: 24 August 2024 Publication History

Abstract

Robust models must demonstrate strong generalizability, even amid environmental changes. However, the complex variability and noise in real-world data often lead to a pronounced performance gap between the training and testing phases. Researchers have recently introduced test-time-domain adaptation (TTA) to address this challenge. TTA methods primarily adapt source-pretrained models to a target domain using only unlabeled test data. This study found that existing TTA methods consider only the largest logit as a pseudo-label and aim to minimize the entropy of test time predictions. This maximizes the predictive confidence of the model. However, this corresponds to the model being overconfident in the local test scenarios. In response, we introduce a novel confidence-calibration loss function called Calibrated Entropy Test-Time Adaptation (CETA), which considers the model's largest logit and the next-highest-ranked one, aiming to strike a balance between overconfidence and underconfidence. This was achieved by incorporating a sample-wise regularization term. We also provide a theoretical foundation for the proposed loss function. Experimentally, our method outperformed existing strategies on benchmark corruption datasets across multiple models, underscoring the efficacy of our approach.

References

[1]
Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. 2006. Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101, 473 (2006), 138--156.
[2]
Keith Beven and Andrew Binley. 1992. The future of distributed models: model calibration and uncertainty prediction. Hydrological processes 6, 3 (1992), 279--298.
[3]
Nontawat Charoenphakdee, Jayakorn Vongkulbhisal, Nuttapong Chairatanakul, and Masashi Sugiyama. 2021. On focal loss for class-posterior probability estimation: A theoretical perspective. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5202--5211.
[4]
Dian Chen, DequanWang, Trevor Darrell, and Sayna Ebrahimi. 2022. Contrastive test-time adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 295--305.
[5]
Dong Chen, Hongqing Zhu, and Suyi Yang. 2023. UC-SFDA: Source-free domain adaptation via uncertainty prediction and evidence-based contrastive learning. Knowledge-Based Systems (2023), 110728.
[6]
Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. 2021. Robust Bench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 1--29.
[7]
Marcus de Carvalho, Mahardhika Pratama, Jie Zhang, and Edward Yapp Kien Yee. 2022. ACDC: Online unsupervised cross-domain adaptation. Knowledge-Based Systems 253 (2022), 109486.
[8]
Cian Eastwood, Ian Mason, Christopher KI Williams, and Bernhard Schölkopf. 2021. Source-free adaptation to measurement shift via bottom-up feature restoration. arXiv preprint arXiv:2107.05446 (2021).
[9]
Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, et al. 2023. A survey of uncertainty in deep neural networks. Artificial Intelligence Review 56, Suppl 1 (2023), 1513--1589.
[10]
Arindam Ghosh, Thomas Schaaf, and Matthew Gormley. 2022. AdaFocal: Calibration-aware Adaptive Focal Loss. In Advances in Neural Information Processing Systems. 1583--1595.
[11]
Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. 2022. NOTE: Robust continual test-time adaptation against temporal correlation. In Advances in Neural Information Processing Systems. 27253--27266.
[12]
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In International Conference on Machine Learning. 1321--1330.
[13]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
[14]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019).
[15]
Dan Hendrycks and Thomas G Dietterich. 2018. Benchmarking neural network robustness to common corruptions and surface variations. arXiv preprint arXiv:1807.01697 (2018).
[16]
Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. In International Conference on Learning Representations. 1--15.
[17]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. stat 1050 (2015), 9.
[18]
Xuefeng Hu, Gokhan Uzunbas, Sirius Chen, Rui Wang, Ashish Shah, Ram Nevatia, and Ser-Nam Lim. 2021. Mixnorm: Test-time adaptation through online normalization estimation. arXiv preprint arXiv:2110.11478 (2021).
[19]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501--1510.
[20]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. 448--456.
[21]
Ansh Khurana, Sujoy Paul, Piyush Rai, Soma Biswas, and Gaurav Aggarwal. 2021. Sita: Single image test-time adaptation. arXiv preprint arXiv:2112.02355 (2021).
[22]
Klim Kireev, Maksym Andriushchenko, and Nicolas Flammarion. 2022. On the effectiveness of adversarial training against common corruptions. In Uncertainty in Artificial Intelligence. 1012--1021.
[23]
Wouter M Kouw and Marco Loog. 2019. A review of domain adaptation without target labels. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 766--785.
[24]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
[25]
Jogendra Nath Kundu, Akshay Kulkarni, Amit Singh, Varun Jampani, and R Venkatesh Babu. 2021. Generalize then adapt: Source-free domain adaptive semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 7046--7056.
[26]
Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. 2016. Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779 (2016).
[27]
Jian Liang, Ran He, and Tieniu Tan. 2023. A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361 (2023).
[28]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.
[29]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980--2988.
[30]
Yuang Liu, Wei Zhang, and Jun Wang. 2021. Source-free domain adaptation for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1215--1224.
[31]
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2016. Unsupervised domain adaptation with residual transfer networks. Advances in neural information processing systems 29 (2016).
[32]
Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. 2013. Domain generalization via invariant feature representation. In International Conference on Machine Learning. 10--18.
[33]
Rafael Müller, Simon Kornblith, and Geoffrey Hinton. 2019. When does label smoothing help?. In Advances in Neural Information Processing Systems. 4694--4703.
[34]
Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. 2023. Towards stable test-time adaptation in dynamic wild world. In International Conference on Learning Representations. 1--27.
[35]
Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey Hinton. 2017. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017).
[36]
Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2008. Dataset shift in machine learning. Mit Press.
[37]
Inkyu Shin, Yi-Hsuan Tsai, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Sparsh Garg, In So Kweon, and Kuk-Jin Yoon. 2022. MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 16928--16937.
[38]
Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. 2020. Test-time training with self-supervision for generalization under distribution shifts. In International Conference on Machine Learning. 9229--9248.
[39]
Linwei Tao, Minjing Dong, and Chang Xu. 2023. Dual Focal Loss for Calibration. In International Conference on Machine Learning. 33833--33849.
[40]
Ambuj Tewari and Peter L Bartlett. 2007. On the Consistency of Multiclass Classification Methods. Journal of Machine Learning Research 8, 5 (2007).
[41]
Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. 2015. Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE international conference on computer vision. 4068--4076.
[42]
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016).
[43]
Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, and Thomas Schön. 2019. Evaluating model calibration in classification. In The 22nd International Conference on Artificial Intelligence and Statistics. 3459-- 3467.
[44]
Dequan Wang, An Ju, Evan Shelhamer, David Wagner, and Trevor Darrell. 2021. Fighting gradients with gradients: Dynamic defenses against adversarial attacks. arXiv preprint arXiv:2105.08714 (2021).
[45]
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. 2020. Tent: Fully test-time adaptation by entropy minimization. arXiv preprint arXiv:2006.10726 (2020).
[46]
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. 2021. Tent: Fully Test-Time Adaptation by Entropy Minimization. In International Conference on Learning Representations. 1--15.
[47]
Min Wang, Hao Yang, and Qing Cheng. 2022. GCL: Graph Calibration Loss for Trustworthy Graph Neural Network. In Proceedings of the 30th ACM International Conference on Multimedia. 988--996.
[48]
Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. 2022. Continual test-time domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7201--7211.
[49]
Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, and Yixuan Li. 2022. Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning. 23631--23644.
[50]
Tao Yang, Shenglong Zhou, Yuwang Wang, Yan Lu, and Nanning Zheng. 2022. Test-time batch normalization. arXiv preprint arXiv:2205.10210 (2022).
[51]
Fuming You, Jingjing Li, and Zhou Zhao. 2021. Test-time batch statistics calibration for covariate shift. arXiv preprint arXiv:2110.04065 (2021).
[52]
Bianca Zadrozny and Charles Elkan. 2001. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In International Conference on Machine Learning. 609--616.
[53]
Hang Zhang, Weike Liu, Hao Yang, Yun Zhou, Cheng Zhu, and Weiming Zhang. 2023. CSAL: Cost sensitive active learning for multi-source drifting stream. Knowledge-Based Systems 277 (2023), 110771.
[54]
Marvin Zhang, Sergey Levine, and Chelsea Finn. 2021. Memo: Test time robustness via adaptation and augmentation. arXiv preprint arXiv:2110.09506 (2021).
[55]
Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, and Lei Zhang. 2022. Exact feature distribution matching for arbitrary style transfer and domain generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8035--8045.
[56]
Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. 2022. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[57]
Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. 2021. Domain generalization with mixstyle. In International Conference on Learning Representations. 1--15.
[58]
Zhi Zhou, Lan-Zhe Guo, Lin-Han Jia, Dingchu Zhang, and Yu-Feng Li. 2023. ODS: test-time adaptation in the presence of open-world data shift. In International Conference on Machine Learning. 42574--42588.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2024
6901 pages
ISBN:9798400704901
DOI:10.1145/3637528
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. entropy minimization
  2. model calibration
  3. test-time adaptation

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 194
    Total Downloads
  • Downloads (Last 12 months)194
  • Downloads (Last 6 weeks)20
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media