research-article

Testing DNN image classifiers for confusion & bias errors

Authors:

Vicente Ordonez,

Gail Kaiser, and

Baishakhi RayAuthors Info & Claims

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

June 2020

Pages 1122 - 1134

https://doi.org/10.1145/3377811.3380400

Published: 01 October 2020 Publication History

Abstract

Image classifiers are an important component of today's software, from consumer and business applications to safety-critical domains. The advent of Deep Neural Networks (DNNs) is the key catalyst behind such wide-spread success. However, wide adoption comes with serious concerns about the robustness of software systems dependent on DNNs for image classification, as several severe erroneous behaviors have been reported under sensitive and critical circumstances. We argue that developers need to rigorously test their software's image classifiers and delay deployment until acceptable. We present an approach to testing image classifier robustness based on class property violations.

We found that many of the reported erroneous cases in popular DNN image classifiers occur because the trained models confuse one class with another or show biases towards some classes over others. These bugs usually violate some class properties of one or more of those classes. Most DNN testing techniques focus on perimage violations, so fail to detect class-level confusions or biases.

We developed a testing technique to automatically detect class-based confusion and bias errors in DNN-driven image classification software. We evaluated our implementation, DeepInspect, on several popular image classifiers with precision up to 100% (avg. 72.6%) for confusion errors, and up to 84.3% (avg. 66.8%) for bias errors. DeepInspect found hundreds of classification mistakes in widely-used models, many exposing errors indicating confusion or bias.

References

[1]

2017. Base pretrained models and datasets in pytorch. https://github.com/aaron-xichen/pytorch-playground

[2]

Erik Arisholm, Lionel C. Briand, and Eivind B. Johannessen. 2010. A systematic and comprehensive investigationof methods to build and evaluate fault prediction models. JSS 83, 1 (2010), 2--17.

Digital Library

[3]

Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2018. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.

[4]

Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. 2016. Measuring neural net robustness with constraints. In Advances in Neural Information Processing Systems. 2613--2621.

[5]

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network Dissection: Quantifying Interpretability of Deep Visual Representations. In Computer Vision and Pattern Recognition.

[6]

Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798--1828.

Digital Library

[7]

Yuriy Brun and Alexandra Meliou. 2018. Software Fairness. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). ACM, New York, NY, USA, 754--759.

Digital Library

[8]

Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In FAT.

[9]

T. Calders, F. Kamiran, and M. Pechenizkiy. 2009. Building Classifiers with Independency Constraints. In 2009 IEEE International Conference on Data Mining Workshops. 13--18.

[10]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on. IEEE, 39--57.

[11]

Yinpeng Dong, Hang Su, Jun Zhu, and Fan Bao. 2017. Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493 (2017).

[12]

Michele Donini, Luca Oneto, Shai Ben-David, John Shawe-Taylor, and Massimiliano Pontil. 2018. Empirical Risk Minimization Under Fairness Constraints. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada. 2796--2806. http://papers.nips.cc/paper/7544-empirical-risk-minimization-under-fairness-constraints

[13]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard S. Zemel. 2012. Fairness Through Awareness. In Proceedings of the Innovations in Theoretical Computer Science Conference abs/1104.3913 (2012), 214--226.

Digital Library

[14]

Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. 2017. Robust Physical-World Attacks on Machine Learning Models. arXiv preprint arXiv:1707.08945 (2017).

[15]

Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410 (2017).

[16]

Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 498--510.

Digital Library

[17]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).

[18]

Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. 2017. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017).

[19]

Loren Grush. 2015. Google engineer apologizes after Photos app tags two black people as gorillas. (2015). https://www.theverge.com/2015/7/1/8880363/google-apologizes-photos-app-tags-two-black-people-gorillas

[20]

Shixiang Gu and Luca Rigazio. 2015. Towards deep neural network architectures robust to adversarial examples. In International Conference on Learning Representations (ICLR).

[21]

Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). USA, 3323--3331.

Digital Library

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[23]

Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. 2017. Adversarial Example Defenses: Ensembles of Weak Defenses Are Not Strong. In Proceedings of the 11th USENIX Conference on Offensive Technologies (Vancouver, BC, Canada) (WOOT'17). USENIX Association, Berkeley, CA, USA, 15--15. http://dl.acm.org/citation.cfm?id=3154768.3154783

[24]

Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. 2017. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284 (2017).

[25]

Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety verification of deep neural networks. In International Conference on Computer Aided Verification. Springer, 3--29.

[26]

Pooja Kamavisdar, Sonam Saluja, and Sonu Agrawal. 2013. A survey on image classification approaches and techniques. International Journal of Advanced Research in Computer and Communication Engineering 2, 1 (2013), 1005--1009.

[27]

Yasutaka Kamei, Shinsuke Matsumoto, Akito Monden, Ken-ichi Matsumoto, Bram Adams, and Ahmed E Hassan. 2010. Revisiting common bug prediction findings using effort-aware models. In 2010 IEEE International Conference on Software Maintenance. IEEE, 1--10.

Digital Library

[28]

Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. Springer International Publishing, Cham, 97--117.

[29]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 1039--1049.

Digital Library

[30]

Michael P. Kim, Omer Reingold, and Guy N. Rothblum. 2018. Fairness Through Computationally-Bounded Awareness. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018) (2018).

[31]

Jernej Kos, Ian Fischer, and Dawn Song. 2017. Adversarial examples for generative models. arXiv preprint arXiv:1702.06832 (2017).

[32]

Alex Krizhevsky. 2012. Learning Multiple Layers of Features from Tiny Images. University of Toronto (05 2012).

[33]

William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis. J. Amer. Statist. Assoc. 47, 260 (1952), 583--621. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1952.10483441

[34]

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In Workshop track at International Conference on Learning Representations (ICLR).

[35]

Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30. 4066--4076.

[36]

Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, and Nakul Verma. 2019. Noise-tolerant fair classification. CoRR abs/1901.10837 (2019). arXiv:1901.10837 http://arxiv.org/abs/1901.10837

[37]

Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[38]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[39]

Zachary C Lipton. 2016. The mythos of model interpretability. Proceedings of the 33rd International Conference on Machine Learning Workshop (2016).

[40]

Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth. 2017. No need to worry about adversarial examples in object detection in autonomous vehicles. In Spotlight Oral Workshop at Proceedings of the IEEE conference on computer vision and pattern recognition.

[41]

Binh Thanh Luong, Salvatore Ruggieri, and Franco Turini. 2011. k-NN as an implementation of situation testing for discrimination discovery and prevention. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 502--510.

Digital Library

[42]

Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity Testing Criteria for Deep Learning Systems. (2018), 120--131.

Digital Library

[43]

Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). ACM, New York, NY, USA, 175--186.

Digital Library

[44]

MalletsDarker. 2018. I took a few shots at Lake Louise today and Google offered me this panorama. (2018). https://www.reddit.com/r/funny/comments/7r9ptc/i_took_a_few_shots_at_lake_louise_today_and/dsvv1nw/

[45]

Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, and Baishakhi Ray. 2019. Metric Learning for Adversarial Robustness. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 478--489. http://papers.nips.cc/paper/8339-metric-learning-for-adversarial-robustness.pdf

[46]

Aditya Krishna Menon and Robert C. Williamson. 2018. The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, FAT 2018, 23-24 February 2018, New York, NY, USA. 107--118. http://proceedings.mlr.press/v81/menon18a.html

[47]

Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR).

[48]

Thomas M. Mitchell. 1997. Machine Learning (1 ed.). McGraw-Hill, Inc., New York, NY, USA.

Digital Library

[49]

Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2017. Methods for interpreting and understanding deep neural networks. Digital Signal Processing (2017).

[50]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807--814.

Digital Library

[51]

Nina Narodytska and Shiva Prasad Kasiviswanathan. 2016. Simple black-box adversarial perturbations for deep networks. In Workshop on Adversarial Training, NIPS 2016.

[52]

Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 427--436.

[53]

Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 4901--4911. http://proceedings.mlr.press/v97/odena19a.html

[54]

Nicolas Papernot, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Fartash Faghri, Alexander Matyasko, Karen Hambardzumyan, Yi-Lin Juang, Alexey Kurakin, Ryan Sheatsley, et al. 2016. cleverhans v2. 0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768 (2016).

[55]

Nicolas Papernot and Patrick McDaniel. 2017. Extending Defensive Distillation. arXiv preprint arXiv:1705.05264 (2017).

[56]

Nicolas Papernot and Patrick McDaniel. 2018. Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning. arXiv preprint arXiv:1803.04765 (2018).

[57]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 506--519.

Digital Library

[58]

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372--387.

[59]

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 582--597.

[60]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. (2017), 1--18.

Digital Library

[61]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems. arXiv preprint arXiv:1712.01785 (2017).

[62]

Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. 2018. Certified defenses against adversarial examples. 6th International Conference on Learning Representations (ICLR) (2018).

[63]

Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 432--441.

[64]

Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. 2013. Sample size vs. bias in defect prediction. In Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, 147--157.

Digital Library

[65]

F. Rahman, D. Posnett, A. Hindle, E. Barr, and P. Devanbu. 2011. BugCache for inspections: hit or miss?. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 322--331.

[66]

Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the" naturalness" of buggy code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 428--439.

Digital Library

[67]

Adam Rose. 2010. Are Face-Detection Cameras Racist? (2010). http://content.time.com/time/business/article/0,8599,1954643,00.html

[68]

Amir Rosenfeld, Richard S. Zemel, and John K. Tsotsos. 2018. The Elephant in the Room. CoRR abs/1808.03305 (2018). arXiv:1808.03305 http://arxiv.org/abs/1808.03305

[69]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1988. Learning representations by back-propagating errors. Cognitive modeling 5, 3 (1988), 1.

[70]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

[71]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV). 618--626.

[72]

Uri Shaham, Yutaro Yamada, and Sahand Negahban. 2015. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432 (2015).

[73]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR).

[74]

Youcheng Sun, Xiaowei Huang, and Daniel Kroening. 2018. Testing Deep Neural Networks. arXiv preprint arXiv:1803.04792 (2018).

[75]

Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE 2018). ACM, New York, NY, USA, 109--119.

Digital Library

[76]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR).

[77]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In International Conference of Software Engineering (ICSE), 2018 IEEE conference on. IEEE.

Digital Library

[78]

Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 3 (2007), 1--13.

[79]

Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial sample detection for deep neural network through model mutation testing. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 1245--1256.

Digital Library

[80]

Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. 2018. Formal Security Analysis of Neural Networks using Symbolic Intervals. (2018).

[81]

Ian H Witten and Eibe Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Digital Library

[82]

Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. 2018. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 8410--8419. http://papers.nips.cc/paper/8060-scaling-provable-adversarial-defenses.pdf

[83]

Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L. Yuille, and Kaiming He. 2019. Feature Denoising for Improving Adversarial Robustness. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]

Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. arXiv preprint arXiv:1704.01155 (2017).

[85]

Mark Yatskar, Luke Zettlemoyer, and Ali Farhadi. 2016. Situation Recognition: Visual Semantic Role Labeling for Image Understanding. In Conference on Computer Vision and Pattern Recognition.

[86]

X. Yuan, P. He, Q. Zhu, and X. Li. 2019. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Transactions on Neural Networks and Learning Systems (2019), 1--20.

[87]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P. Gummadi. 2017. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification Without Disparate Mistreatment. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia). 1171--1180.

[88]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2017. Fairness constraints: Mechanisms for fair classification. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics ((AISTATS) 2017), Vol. 54. JMLR.

[89]

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning. 325--333.

[90]

Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing. arXiv preprint arXiv:1802.02295 (2018).

[91]

Quan-shi Zhang and Song-Chun Zhu. 2018. Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering 19, 1 (2018), 27--39.

[92]

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2941--2951. https://www.aclweb.org/anthology/D17-1319

[93]

Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480--4488.

Cited By

Linhardt LMüller KMontavon G(2024)Preemptively pruning Clever-Hans strategies in deep neural networksInformation Fusion10.1016/j.inffus.2023.102094103:COnline publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.inffus.2023.102094
Li MDai WFan MQian WYang XTao YZhao C(2023)Combining Deep Learning and Hydrological Analysis for Identifying Check Dam Systems from Remote Sensing Images and DEMs in the Yellow River BasinInternational Journal of Environmental Research and Public Health10.3390/ijerph2005463620:5(4636)Online publication date: 6-Mar-2023
https://doi.org/10.3390/ijerph20054636
Yin YFeng YLiu ZZhao Z(2023)Practical Accuracy Evaluation for Deep Learning Systems via Latent Representation DiscrepancyProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609457(205-215)Online publication date: 4-Aug-2023
https://dl.acm.org/doi/10.1145/3609437.3609457
Show More Cited By

Index Terms

Testing DNN image classifiers for confusion & bias errors
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Testing DNN image classifiers for confusion & bias errors
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings

We found that many of the reported erroneous cases in popular DNN image classifiers occur because the trained models confuse one class with another or show biases towards some classes over others. Most existing DNN testing techniques focus on per-image ...
Read More
Repairing confusion and bias errors for DNN-based image classifiers
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Recent works in DNN testing show that DNN based image classifiers are susceptible to confusion and bias errors. A DNN model, even robust trained model can be highly confused between certain pair of objects or highly bias towards some object than others. ...
Read More
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

The classification problem is considered in which an output variable y assumes discrete values with respective probabilities that depend upon the simultaneous values of a set of input variables x = {x_1,....,x_n}. At issue is how error in the estimates ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

June 2020

1640 pages

ISBN:9781450371216

DOI:10.1145/3377811

General Chairs:
Gregg Rothermel
North Carolina State University
,
Doo-Hwan Bae
KAIST, South Korea

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

KIISE: Korean Institute of Information Scientists and Engineers
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ICSE '20

Sponsor:

SIGSOFT

ICSE '20: 42nd International Conference on Software Engineering

June 27 - July 19, 2020

Seoul, South Korea

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
487
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)9

Other Metrics

View Author Metrics

Citations

Cited By

Linhardt LMüller KMontavon G(2024)Preemptively pruning Clever-Hans strategies in deep neural networksInformation Fusion10.1016/j.inffus.2023.102094103:COnline publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.inffus.2023.102094
Li MDai WFan MQian WYang XTao YZhao C(2023)Combining Deep Learning and Hydrological Analysis for Identifying Check Dam Systems from Remote Sensing Images and DEMs in the Yellow River BasinInternational Journal of Environmental Research and Public Health10.3390/ijerph2005463620:5(4636)Online publication date: 6-Mar-2023
https://doi.org/10.3390/ijerph20054636
Yin YFeng YLiu ZZhao Z(2023)Practical Accuracy Evaluation for Deep Learning Systems via Latent Representation DiscrepancyProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609457(205-215)Online publication date: 4-Aug-2023
https://dl.acm.org/doi/10.1145/3609437.3609457
Mo RZhang YWang YZhang SXiong PLi ZZhao Y(2023)Exploring the Impact of Code Clones on Deep Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/360718132:6(1-34)Online publication date: 28-Sep-2023
https://dl.acm.org/doi/10.1145/3607181
von Stein MShriver DElbaum S(2023)DeepManeuver: Adversarial Test Generation for Trajectory Manipulation of Autonomous VehiclesIEEE Transactions on Software Engineering10.1109/TSE.2023.330144349:10(4496-4509)Online publication date: 8-Aug-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3301443
Xue MNepal SLiu LSethuvenkatraman SYuan XRudolph CSun REisenhauer G(2023)RAI4IoE: Responsible AI for Enabling the Internet of Energy2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)10.1109/TPS-ISA58951.2023.00012(13-22)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TPS-ISA58951.2023.00012
Jung JPark GKim G(2023)Detecting medial patellar luxation with ensemble deep convolutional neural network based on a single rear view image of the hindlimbScientific Reports10.1038/s41598-023-43872-713:1Online publication date: 10-Oct-2023
https://doi.org/10.1038/s41598-023-43872-7
Sauter DLodde GNensa FSchadendorf DLivingstone EKukuk M(2022)Validating Automatic Concept-Based Explanations for AI-Based Digital HistopathologySensors10.3390/s2214534622:14(5346)Online publication date: 18-Jul-2022
https://doi.org/10.3390/s22145346
Barillaro LAgapito GCannataro M(2022)Scalable deep learning for healthcareProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3535508.3545590(1-8)Online publication date: 7-Aug-2022
https://dl.acm.org/doi/10.1145/3535508.3545590
de Santiago Júnior VGarrido AWong WDe Angelis GDo HNguyen B(2022)A method and experiment to evaluate deep neural networks as test oracles for scientific softwareProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527232(40-51)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3524481.3527232
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents