Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3666122.3667155guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Beyond invariance: test-time label-shift adaptation for addressing "spurious" correlations

Published: 30 May 2024 Publication History

Abstract

Changes in the data distribution at test time can have deleterious effects on the performance of predictive models p(y|x). We consider situations where there are additional meta-data labels (such as group labels), denoted by z, that can account for such changes in the distribution. In particular, we assume that the prior distribution p(y, z), which models the dependence between the class label y and the "nuisance" factors z, may change across domains, either due to a change in the correlation between these terms, or a change in one of their marginals. However, we assume that the generative model for features p(x|y, z) is invariant across domains. We note that this corresponds to an expanded version of the widely used "label shift" assumption, where the labels now also include the nuisance factors z. Based on this observation, we propose a test-time label shift correction that adapts to changes in the joint distribution p(y, z) using EM applied to unlabeled samples from the target domain distribution, pt(x). Importantly, we are able to avoid fitting a generative model p(x|y, z), and merely need to reweight the outputs of a discriminative model ps(y, z|x) trained on the source distribution. We evaluate our method, which we call "Test-Time Label-Shift Adaptation" (TTLSA), on several standard image and text datasets, as well as the CheXpert chest X-ray dataset, and show that it improves performance over methods that target invariance to changes in the distribution, as well as baseline empirical risk minimization methods. Code for reproducing experiments is available at https://github.com/nalzok/test-time-label-shift.

Supplementary Material

Additional material (3666122.3667155_supp.pdf)
Supplemental material.

References

[1]
Alexandari, A. Kundaje, and A. Shrikumar. Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 222-232. PMLR, July 2020. URL https://proceedings.mlr.press/v119/alexandari20a.html.
[2]
M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
[3]
S. Barocas, M. Hardt, and A. Narayanan. Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org, 2019. http://www.fairmlbook.org.
[4]
D. Borkan, L. Dixon, J. Sorensen, N. Thain, and L. Vasserman. Nuanced metrics for measuring unintended bias with real data for text classification. In WWW, WWW '19, pages 491-500, New York, NY, USA, May 2019. Association for Computing Machinery.
[5]
M. J. Bours. Bayes'rule in diagnosis. Journal of Clinical Epidemiology, 131:158-160, 2021.
[6]
S. Garg, Y. Wu, S. Balakrishnan, and Z. Lipton. A unified view of label shift estimation. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 3290-3300. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/219e052492f4008818b8adb6366c7ed6-Paper.pdf.
[7]
R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665-673, Nov. 2020.
[8]
J. W. Gichoya, I. Banerjee, A. R. Bhimireddy, J. L. Burns, L. A. Celi, L.-C. Chen, R. Correa, N. Dullerud, M. Ghassemi, S.-C. Huang, et al. Ai recognition of patient race in medical imaging: a modelling study. The Lancet Digital Health, 4(6):e406-e414, 2022.
[9]
B. Glocker, C. Jones, M. Bernhardt, and S. Winzeck. Risk of bias in chest x-ray foundation models. Sept. 2022. URL http://arxiv.org/abs/2209.02965.
[10]
I. Gulrajani and D. Lopez-Paz. In search of lost domain generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=lQdXeXDoWtI.
[11]
C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On calibration of modern neural networks. 2017.
[12]
B. Y. Idrissi, M. Arjovsky, M. Pezeshki, and D. Lopez-Paz. Simple data balancing achieves competitive worst-group-accuracy. In Conference on Causal Learning and Reasoning, pages 336-351. PMLR, 2022.
[13]
J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi, J. K. Sandberg, R. Jones, D. B. Larson, C. P. Langlotz, B. N. Patel, M. P. Lungren, and A. Y. Ng. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):590-597, July 2019. URL https://ojs.aaai.org/index.php/AAAI/article/view/3834.
[14]
P. Izmailov, P. Kirichenko, N. Gruver, and A. G. Wilson. On feature learning in the presence of spurious correlations. In NIPS, Oct. 2022. URL http://arxiv.org/abs/2210.11369.
[15]
S. Jabbour, D. Fouhey, E. Kazerooni, M. W. Sjoding, and J. Wiens. Deep learning applied to chest x-rays: Exploiting and preventing shortcuts. In F. Doshi-Velez, J. Fackler, K. Jung, D. Kale, R. Ranganath, B. Wallace, and J. Wiens, editors, Proceedings of the 5th Machine Learning for Healthcare Conference, volume 126 of Proceedings of Machine Learning Research, pages 750-782. PMLR, Aug. 2020. URL https://proceedings.mlr.press/v126/jabbour20a.html.
[16]
Y. Jiang and V. Veitch. Invariant and transportable representations for Anti-Causal domain shifts. July 2022. URL http://arxiv.org/abs/2207.01603.
[17]
J. N. Kaur, E. Kiciman, and A. Sharma. Modeling the data-generating process is necessary for out-of-distribution generalization. In Workshop on Spurious Correlations, Invariance, and Stability, ICML 2022, 2022.
[18]
P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, S. Beery, J. Leskovec, A. Kundaje, E. Pierson, S. Levine, C. Finn, and P. Liang. WILDS: A benchmark of in-the-wild distribution shifts. Dec. 2020. URL http://arxiv.org/abs/2012.07421.
[19]
J. Liang, R. He, and T. Tan. A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361, 2023.
[20]
Z. Lipton, Y.-X. Wang, and A. Smola. Detecting and correcting for label shift with black box predictors. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3122-3130. PMLR, July 2018. URL https://proceedings.mlr.press/v80/lipton18a.html.
[21]
E. Z. Liu, B. Haghgoo, A. S. Chen, A. Raghunathan, P. W. Koh, S. Sagawa, P. Liang, and C. Finn. Just train twice: Improving group robustness without training group information. In ICML, July 2021. URL http://arxiv.org/abs/2107.09044.
[22]
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In ICCV, 2015.
[23]
V. S. Lokhande, K. Sohn, J. Yoon, M. Udell, C.-Y. Lee, and T. Pfister. Towards group robustness in the presence of partial group labels. In ICML Workshop on Spurious Correlations, Invariance and Stability, Jan. 2022. URL http://arxiv.org/abs/2201.03668.
[24]
M. Makar and A. D'Amour. Fairness and robustness in anti-causal prediction. Sept. 2022. URL http://arxiv.org/abs/2209.09423.
[25]
M. Makar, B. Packer, D. Moldovan, D. Blalock, Y. Halpern, and A. D'Amour. Causally motivated shortcut removal using auxiliary labels. In AISTATS, volume 151, pages 739-766, 2022. URL https://proceedings.mlr.press/v151/makar22a/makar22a.pdf.
[26]
T. Makino, K. J. Geras, and K. Cho. Generative multitask learning mitigates target-causing confounding. Feb. 2022. URL http://arxiv.org/abs/2202.04136.
[27]
A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar. Long-tail learning via logit adjustment. In ICLR, 2021. URL http://arxiv.org/abs/2007.07314.
[28]
K. P. Murphy. Probabilistic Machine Learning: An introduction. MIT Press, 2022.
[29]
J. Nam, J. Kim, J. Lee, and J. Shin. Spread spurious attribute: Improving worst-group accuracy with spurious attribute estimation. In ICLR, Apr. 2022. URL http://arxiv.org/abs/2204.02070.
[30]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
[31]
M. Pham, M. Cho, A. Joshi, and C. Hegde. Revisiting self-distillation, 2022.
[32]
A. M. Puli, L. H. Zhang, E. K. Oermann, and R. Ranganath. Out-of-distribution generalization in the presence of Nuisance-Induced spurious correlations. In ICLR, May 2022. URL https://openreview.net/forum?id=12RoR2o32T.
[33]
S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco. Connectionist probability estimators in HMM speech recognition. IEEE Trans. Audio Speech Lang. Processing, 2(1):161-174, Jan. 1994.
[34]
M. Saerens, P. Latinne, and C. Decaestecker. Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure. Neural Computation, 14(1):21-41, 01 2002. ISSN 0899-7667.
[35]
S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for Worst-Case generalization. In ICLR, 2020. URL http://arxiv.org/abs/1911.08731.
[36]
B. Schoelkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML '12, pages 1255-1262, New York, NY, USA, July 2012. Omnipress. ISBN 978-1-4503-1285-1. URL https://icml.cc/2012/papers/625.pdf.
[37]
A. B. Sellergren, C. Chen, Z. Nabulsi, Y. Li, A. Maschinot, A. Sarna, J. Huang, C. Lau, S. R. Kalidindi, M. Etemadi, F. Garcia-Vicente, D. Melnick, Y. Liu, K. Eswaran, D. Tse, N. Beladia, D. Krishnan, and S. Shetty. Simplified transfer learning for chest radiography models using less data. Radiology, 305(2):454-465, Nov. 2022.
[38]
H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227-244, 2000.
[39]
N. S. Sohoni, M. Sanjabi, N. Ballas, A. Grover, S. Nie, H. Firooz, and C. Ré. BARACK: Partially supervised group robustness with guarantees. 2021.
[40]
V. Veitch, A. D'Amour, S. Yadlowsky, and J. Eisenstein. Counterfactual invariance to spurious correlations in text classification. In NIPS, Nov. 2021. URL https://openreview.net/forum?id=BdKxQp0iBi8.
[41]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD birds-200-2011 dataset. Technical report, 2011.
[42]
D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=uXl3bZLkr3c.
[43]
O. Wiles, S. Gowal, F. Stimberg, S.-A. Rebuffi, I. Ktena, K. D. Dvijotham, and A. T. Cemgil. A finegrained analysis on distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Dl4LetuLdyK.
[44]
A. Williams, N. Nangia, and S. Bowman. A Broad-Coverage challenge corpus for sentence understanding through inference. In NAACL, pages 1112-1122, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. URL https://aclanthology.org/N18-1101.
[45]
M. Yi, R. Wang, J. Sun, Z. Li, and Z.-M. Ma. Breaking correlation shift via conditional invariant regularizer. In The Eleventh International Conference on Learning Representations.
[46]
M. Zhang, S. Levine, and C. Finn. Memo: Test time robustness via adaptation and augmentation. arXiv preprint arXiv:2110.09506, 2021.
[47]
J. Zheng and M. Makar. Causally motivated multi-shortcut identification and removal. In NIPS, Oct. 2022. URL https://openreview.net/forum?id=-ZQOx6yaVa-.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media