research-article

A too-good-to-be-true prior to reduce shortcut reliance

Authors:

Nikolay Dagaev,

Brett D. Roads,

Daniel N. Barry,

Kaustubh R. Patil,

Bradley C. LoveAuthors Info & Claims

Volume 166, Issue C

Pages 164 - 171

https://doi.org/10.1016/j.patrec.2022.12.010

Published: 01 February 2023 Publication History

Highlights

•

Challenging machine learning problems are unlikely to have trivial solutions.

•

Solutions from low-capacity models are likely shortcuts that won’t generalize.

•

One inductive bias for robust generalization is to avoid overly simple solutions.

•

A low-capacity model can identify shortcuts to help train a high-capacity model.

Abstract

Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on ǣshortcutsǥ superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships, including shortcuts. We find that LCNs can serve as shortcut detectors. Furthermore, an LCN’s predictions can be used in a two-stage approach to encourage a high-capacity network (HCN) to rely on deeper invariant features that should generalize broadly. In particular, items that the LCN can master are downweighted when training the HCN. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization.

References

[1]

H. Bahng, S. Chun, S. Yun, J. Choo, S.J. Oh, Learning de-biased representations with biased representations, International Conference on Machine Learning, PMLR, 2020, pp. 528–539.

[2]

S. Beery, G. Van Horn, P. Perona, Recognition in terra incognita, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 456–473.

[3]

Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, Proceedings of the International Conference on Machine Learning, 2009, pp. 41–48.

[4]

C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag, Berlin, Heidelberg, 2006.

Digital Library

[5]

W. Brendel, M. Bethge, Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet, arXiv preprint arXiv:1904.00760(2019).

[6]

R. Cadene, C. Dancette, H. Ben-Younes, M. Cord, D. Parikh, RUBi: reducing unimodal biases in visual question answering, arXiv preprint arXiv:1906.10169(2019).

[7]

C. Clark, M. Yatskar, L. Zettlemoyer, Don’t take the easy way out: ensemble based methods for avoiding known dataset biases, arXiv preprint arXiv:1909.03683(2019).

[8]

C. Clark, M. Yatskar, L. Zettlemoyer, Learning to model and ignore dataset bias with mixed capacity ensembles, arXiv preprint arXiv:2011.03856(2020).

[9]

R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, F.A. Wichmann, Shortcut learning in deep neural networks, Nat. Mach. Intell. 2 (11) (2020) 665–673.

[10]

R. Geirhos, C.R. Medina Temme, J. Rauber, H.H. Schütt, M. Bethge, F.A. Wichmann, Generalisation in humans and deep neural networks, Proceeding of the Annual Conference on Neural Information Processing Systems, Curran, 2019, pp. 7549–7561.

[11]

R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F.A. Wichmann, W. Brendel, ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, arXiv preprint arXiv:1811.12231(2018).

[12]

X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2010, pp. 249–256.

[13]

I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint arXiv:1412.6572(2014).

[14]

G. Hacohen, D. Weinshall, On the power of curriculum learning in training deep networks, Proceedings of the International Conference on Machine Learning, PMLR, 2019, pp. 2535–2544.

[15]

H. He, S. Zha, H. Wang, Unlearn dataset bias in natural language inference by fitting the residual, arXiv preprint arXiv:1908.10763(2019).

[16]

K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.

[17]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

[18]

D. Hendrycks, T. Dietterich, Benchmarking neural network robustness to common corruptions and perturbations, arXiv preprint arXiv:1903.12261(2019).

[19]

D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, D. Song, Natural adversarial examples, arXiv preprint arXiv:1907.07174(2019).

[20]

K. Hermann, A. Lampinen, What shapes feature representations? Exploring datasets, architectures, and training, Adv. Neural Inf. Process. Syst. 33 (2020).

[21]

Z. Huang, H. Wang, E.P. Xing, D. Huang, Self-challenging improves cross-domain generalization, ECCV, 2020.

[22]

L. Jiang, D. Meng, Q. Zhao, S. Shan, A. Hauptmann, Self-paced curriculum learning, Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29, 2015.

[23]

B. Kim, H. Kim, K. Kim, S. Kim, J. Kim, Learning not to learn: training deep neural networks with biased data, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9012–9020.

[24]

A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images (2009).

[25]

M.P. Kumar, B. Packer, D. Koller, Self-paced learning for latent variable models, Proceedings of the International Conference on Neural Information Processing Systems, Vol. 1, 2010, pp. 1189–1197.

[26]

Y. Li, N. Vasconcelos, REPAIR: Removing representation bias by dataset resampling, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9572–9581.

[27]

G. Malhotra, B.D. Evans, J.S. Bowers, Hiding a plane with a pixel: examining shape-bias in CNNs and the benefit of building in biological constraints, Vision Res. 174 (2020) 57–68,.

[28]

T. Malisiewicz, A. Gupta, A.A. Efros, Ensemble of exemplar-SVMs for object detection and beyond, Proceedings of the International Conference on Computer Vision, IEEE, 2011, pp. 89–96.

[29]

D. Meng, Q. Zhao, L. Jiang, What objective does self-paced learning indeed optimize?, arXiv preprint arXiv:1511.06049(2015).

[30]

M.L. Minsky, S. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press, Cambridge, MA, 1969.

[31]

J. Nam, H. Cha, S. Ahn, J. Lee, J. Shin, Learning from failure: training debiased classifier from biased classifier, arXiv preprint arXiv:2007.02561(2020).

[32]

B. Recht, R. Roelofs, L. Schmidt, V. Shankar, Do ImageNet classifiers generalize to ImageNet?, Proceedings of the International Conference on Machine Learning, PMLR, 2019, pp. 5389–5400.

[33]

F. Rosenblatt, The perceptron: a probabilistic model for information storage in the brain, Psychol Rev. 65 (1958) 386–408.

[34]

V. Sanh, T. Wolf, Y. Belinkov, A.M. Rush, Learning from others’ mistakes: avoiding dataset biases without modeling them, arXiv preprint arXiv:2012.01300(2020).

[35]

H. Shah, K. Tamuly, A. Raghunathan, P. Jain, P. Netrapalli, The pitfalls of simplicity bias in neural networks, arXiv preprint arXiv:2006.07710(2020).

[36]

A. Shrivastava, A. Gupta, R. Girshick, Training region-based object detectors with online hard example mining, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761–769.

[37]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, Proceedings of the International Conference on Learning Representations, 2015.

[38]

S. Sinha, H. Bharadhwaj, A. Goyal, H. Larochelle, A. Garg, F. Shkurti, DIBS: diversity inducing information bottleneck in model ensembles, arXiv preprint arXiv:2003.04514(2020).

[39]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199(2013).

[40]

J. Wu, Q. Zhang, G. Xu, Tiny imagenet challenge, 2017.

[41]

X. Wu, E. Dyer, B. Neyshabur, When do curricula work?, arXiv preprint arXiv:2012.03107(2020).

[42]

J.R. Zech, M.A. Badgeley, M. Liu, A.B. Costa, J.J. Titano, E.K. Oermann, Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study, PLoS Med. 15 (11) (2018) e1002683,.

Cited By

Boland CAnderson OGoatman KHipwell JTsaftaris SDahdouh S(2024)All You Need Is a Guiding Hand: Mitigating Shortcut Bias in Deep Learning Models for Medical ImagingEthics and Fairness in Medical Imaging10.1007/978-3-031-72787-0_7(67-77)Online publication date: 6-Oct-2024
https://dl.acm.org/doi/10.1007/978-3-031-72787-0_7

Index Terms

A too-good-to-be-true prior to reduce shortcut reliance
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Windows 8.1 Shortcut Keys: A Complete List of Windows 8.1 Shortcuts
Shortcut Learning in Medical Image Segmentation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024
Abstract
Shortcut learning is a phenomenon where machine learning models prioritize learning simple, potentially misleading cues from data that do not generalize well beyond the training set. While existing research primarily investigates this in the realm ...
COMI: COrrect and MItigate Shortcut Learning Behavior in Deep Neural Networks
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Deep Neural Networks (DNNs), despite their notable progress across information retrieval tasks, encounter the issues of shortcut learning and struggle with poor generalization due to their reliance on spurious correlations between features and labels. ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition Letters

Pattern Recognition Letters Volume 166, Issue C

Feb 2023

218 pages

ISSN:0167-8655

Issue’s Table of Contents

Copyright © 2022.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 February 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Boland CAnderson OGoatman KHipwell JTsaftaris SDahdouh S(2024)All You Need Is a Guiding Hand: Mitigating Shortcut Bias in Deep Learning Models for Medical ImagingEthics and Fairness in Medical Imaging10.1007/978-3-031-72787-0_7(67-77)Online publication date: 6-Oct-2024
https://dl.acm.org/doi/10.1007/978-3-031-72787-0_7

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents