Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A too-good-to-be-true prior to reduce shortcut reliance

Published: 01 February 2023 Publication History

Highlights

Challenging machine learning problems are unlikely to have trivial solutions.
Solutions from low-capacity models are likely shortcuts that won’t generalize.
One inductive bias for robust generalization is to avoid overly simple solutions.
A low-capacity model can identify shortcuts to help train a high-capacity model.

Abstract

Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on ǣshortcutsǥ superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships, including shortcuts. We find that LCNs can serve as shortcut detectors. Furthermore, an LCN’s predictions can be used in a two-stage approach to encourage a high-capacity network (HCN) to rely on deeper invariant features that should generalize broadly. In particular, items that the LCN can master are downweighted when training the HCN. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization.

References

[1]
H. Bahng, S. Chun, S. Yun, J. Choo, S.J. Oh, Learning de-biased representations with biased representations, International Conference on Machine Learning, PMLR, 2020, pp. 528–539.
[2]
S. Beery, G. Van Horn, P. Perona, Recognition in terra incognita, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 456–473.
[3]
Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, Proceedings of the International Conference on Machine Learning, 2009, pp. 41–48.
[4]
C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag, Berlin, Heidelberg, 2006.
[5]
W. Brendel, M. Bethge, Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet, arXiv preprint arXiv:1904.00760(2019).
[6]
R. Cadene, C. Dancette, H. Ben-Younes, M. Cord, D. Parikh, RUBi: reducing unimodal biases in visual question answering, arXiv preprint arXiv:1906.10169(2019).
[7]
C. Clark, M. Yatskar, L. Zettlemoyer, Don’t take the easy way out: ensemble based methods for avoiding known dataset biases, arXiv preprint arXiv:1909.03683(2019).
[8]
C. Clark, M. Yatskar, L. Zettlemoyer, Learning to model and ignore dataset bias with mixed capacity ensembles, arXiv preprint arXiv:2011.03856(2020).
[9]
R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, F.A. Wichmann, Shortcut learning in deep neural networks, Nat. Mach. Intell. 2 (11) (2020) 665–673.
[10]
R. Geirhos, C.R. Medina Temme, J. Rauber, H.H. Schütt, M. Bethge, F.A. Wichmann, Generalisation in humans and deep neural networks, Proceeding of the Annual Conference on Neural Information Processing Systems, Curran, 2019, pp. 7549–7561.
[11]
R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F.A. Wichmann, W. Brendel, ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, arXiv preprint arXiv:1811.12231(2018).
[12]
X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2010, pp. 249–256.
[13]
I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint arXiv:1412.6572(2014).
[14]
G. Hacohen, D. Weinshall, On the power of curriculum learning in training deep networks, Proceedings of the International Conference on Machine Learning, PMLR, 2019, pp. 2535–2544.
[15]
H. He, S. Zha, H. Wang, Unlearn dataset bias in natural language inference by fitting the residual, arXiv preprint arXiv:1908.10763(2019).
[16]
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.
[17]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[18]
D. Hendrycks, T. Dietterich, Benchmarking neural network robustness to common corruptions and perturbations, arXiv preprint arXiv:1903.12261(2019).
[19]
D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, D. Song, Natural adversarial examples, arXiv preprint arXiv:1907.07174(2019).
[20]
K. Hermann, A. Lampinen, What shapes feature representations? Exploring datasets, architectures, and training, Adv. Neural Inf. Process. Syst. 33 (2020).
[21]
Z. Huang, H. Wang, E.P. Xing, D. Huang, Self-challenging improves cross-domain generalization, ECCV, 2020.
[22]
L. Jiang, D. Meng, Q. Zhao, S. Shan, A. Hauptmann, Self-paced curriculum learning, Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29, 2015.
[23]
B. Kim, H. Kim, K. Kim, S. Kim, J. Kim, Learning not to learn: training deep neural networks with biased data, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9012–9020.
[24]
A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images (2009).
[25]
M.P. Kumar, B. Packer, D. Koller, Self-paced learning for latent variable models, Proceedings of the International Conference on Neural Information Processing Systems, Vol. 1, 2010, pp. 1189–1197.
[26]
Y. Li, N. Vasconcelos, REPAIR: Removing representation bias by dataset resampling, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9572–9581.
[27]
G. Malhotra, B.D. Evans, J.S. Bowers, Hiding a plane with a pixel: examining shape-bias in CNNs and the benefit of building in biological constraints, Vision Res. 174 (2020) 57–68,.
[28]
T. Malisiewicz, A. Gupta, A.A. Efros, Ensemble of exemplar-SVMs for object detection and beyond, Proceedings of the International Conference on Computer Vision, IEEE, 2011, pp. 89–96.
[29]
D. Meng, Q. Zhao, L. Jiang, What objective does self-paced learning indeed optimize?, arXiv preprint arXiv:1511.06049(2015).
[30]
M.L. Minsky, S. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press, Cambridge, MA, 1969.
[31]
J. Nam, H. Cha, S. Ahn, J. Lee, J. Shin, Learning from failure: training debiased classifier from biased classifier, arXiv preprint arXiv:2007.02561(2020).
[32]
B. Recht, R. Roelofs, L. Schmidt, V. Shankar, Do ImageNet classifiers generalize to ImageNet?, Proceedings of the International Conference on Machine Learning, PMLR, 2019, pp. 5389–5400.
[33]
F. Rosenblatt, The perceptron: a probabilistic model for information storage in the brain, Psychol Rev. 65 (1958) 386–408.
[34]
V. Sanh, T. Wolf, Y. Belinkov, A.M. Rush, Learning from others’ mistakes: avoiding dataset biases without modeling them, arXiv preprint arXiv:2012.01300(2020).
[35]
H. Shah, K. Tamuly, A. Raghunathan, P. Jain, P. Netrapalli, The pitfalls of simplicity bias in neural networks, arXiv preprint arXiv:2006.07710(2020).
[36]
A. Shrivastava, A. Gupta, R. Girshick, Training region-based object detectors with online hard example mining, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 761–769.
[37]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, Proceedings of the International Conference on Learning Representations, 2015.
[38]
S. Sinha, H. Bharadhwaj, A. Goyal, H. Larochelle, A. Garg, F. Shkurti, DIBS: diversity inducing information bottleneck in model ensembles, arXiv preprint arXiv:2003.04514(2020).
[39]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199(2013).
[40]
J. Wu, Q. Zhang, G. Xu, Tiny imagenet challenge, 2017.
[41]
X. Wu, E. Dyer, B. Neyshabur, When do curricula work?, arXiv preprint arXiv:2012.03107(2020).
[42]
J.R. Zech, M.A. Badgeley, M. Liu, A.B. Costa, J.J. Titano, E.K. Oermann, Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study, PLoS Med. 15 (11) (2018) e1002683,.

Cited By

View all
  • (2024)All You Need Is a Guiding Hand: Mitigating Shortcut Bias in Deep Learning Models for Medical ImagingEthics and Fairness in Medical Imaging10.1007/978-3-031-72787-0_7(67-77)Online publication date: 6-Oct-2024

Index Terms

  1. A too-good-to-be-true prior to reduce shortcut reliance
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Pattern Recognition Letters
      Pattern Recognition Letters  Volume 166, Issue C
      Feb 2023
      218 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 February 2023

      Author Tags

      1. Shortcut learning
      2. Out-of-distribution generalization
      3. Robustness
      4. Deep learning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)All You Need Is a Guiding Hand: Mitigating Shortcut Bias in Deep Learning Models for Medical ImagingEthics and Fairness in Medical Imaging10.1007/978-3-031-72787-0_7(67-77)Online publication date: 6-Oct-2024

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media