research-article

Distribution-Aware Testing of Neural Networks Using Generative Models

Authors:

Matthew B. Dwyer,

Mary Lou SoffaAuthors Info & Claims

ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

Pages 226 - 237

https://doi.org/10.1109/ICSE43902.2021.00032

Published: 05 November 2021 Publication History

Abstract

The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics.

To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.

References

[1]

M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, "End to end learning for self-driving cars," CoRR, vol. abs/1604.07316, 2016. [Online]. Available: http://arxiv.org/abs/1604.07316

[2]

S. Pendleton, H. Andersen, X. Du, X. Shen, M. Meghjani, Y. Eng, D. Rus, and M. Ang, "Perception, planning, control, and coordination for autonomous vehicles," Machines, vol. 5, no. 1, p. 6, 2017.

[3]

N. Smolyanskiy, A. Kamenev, J. Smith, and S. Birchfield, "Toward low-flying autonomous mav trail navigation using deep neural networks for environmental awareness," in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017, pp. 4241--4247.

[4]

A. Loquercio, A. I. Maqueda, C. R. D. Blanco, and D. Scaramuzza, "Dronet: Learning to fly by driving," IEEE Robotics and Automation Letters, 2018.

[5]

K. Pei, Y. Cao, J. Yang, and S. Jana, "Deepxplore: Automated whitebox testing of deep learning systems," in proceedings of the 26th Symposium on Operating Systems Principles, 2017, pp. 1--18.

Digital Library

[6]

L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, Y. Liu et al., "Deepgauge: Multi-granularity testing criteria for deep learning systems," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 120--131.

Digital Library

[7]

Y. Sun, X. Huang, D. Kroening, J. Sharp, M. Hill, and R. Ashmore, "Testing deep neural networks," arXiv preprint arXiv:1803.04792, 2018.

[8]

X. Xie, J. W. K. Ho, C. Murphy, G. E. Kaiser, B. Xu, and T. Y. Chen, "Testing and validating machine learning classifiers by metamorphic testing," J. Syst. Softw., vol. 84, no. 4, pp. 544--558, 2011. [Online]. Available: https://doi.org/10.1016/j.jss.2010.11.920

Digital Library

[9]

Y. Tian, K. Pei, S. Jana, and B. Ray, "Deeptest: Automated testing of deep-neural-network-driven autonomous cars," in Proceedings of the 40th international conference on software engineering, 2018, pp. 303--314.

Digital Library

[10]

X. Huang, D. Kroening, W. Ruan, J. Sharp, Y. Sun, E. Thamo, M. Wu, and X. Yi, "A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability," Computer Science Review, vol. 37, p. 100270, 2020.

[11]

J. H. Hayes and J. Offutt, "Input validation analysis and testing," Empirical Software Engineering, vol. 11, no. 4, pp. 493--522, 2006.

Digital Library

[12]

N. Li, T. Xie, M. Jin, and C. Liu, "Perturbation-based user-input-validation testing of web applications," Journal of Systems and Software, vol. 83, no. 11, pp. 2263--2274, 2010.

Digital Library

[13]

H. Liu and H. B. K. Tan, "Covering code behavior on input validation in functional testing," Information and Software Technology, vol. 51, no. 2, pp. 546--553, 2009.

Digital Library

[14]

K. Taneja, N. Li, M. R. Marri, T. Xie, and N. Tillmann, "Mitv: multiple-implementation testing of user-input validators for web applications," in Proceedings of the IEEE/ACM international conference on Automated software engineering, 2010, pp. 131--134.

Digital Library

[15]

S. Sinha and M. J. Harrold, "Analysis and testing of programs with exception handling constructs," IEEE Transactions on Software Engineering, vol. 26, no. 9, pp. 849--871, 2000.

Digital Library

[16]

P. Zhang and S. Elbaum, "Amplifying tests to validate exception handling code: An extended study in the mobile application domain," ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 23, no. 4, pp. 1--28, 2014.

Digital Library

[17]

A. Goffi, A. Gorla, M. D. Ernst, and M. Pezzè, "Automatic generation of oracles for exceptional behaviors," in Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016, pp. 213--224.

Digital Library

[18]

Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, and D. Kroening, "Concolic testing for deep neural networks," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 109--119.

Digital Library

[19]

J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun, "Dlfuzz: Differential fuzzing testing of deep learning systems," in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 739--743.

Digital Library

[20]

J. An and S. Cho, "Variational autoencoder based anomaly detection using reconstruction probability," Special Lecture on IE, vol. 2, no. 1, 2015.

[21]

H. Xu, W. Chen, N. Zhao, Z. Li, J. Bu, Z. Li, Y. Liu, Y. Zhao, D. Pei, Y. Feng et al., "Unsupervised anomaly detection via variational autoencoder for seasonal kpis in web applications," in Proceedings of the 2018 World Wide Web Conference, 2018, pp. 187--196.

Digital Library

[22]

H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, "Efficient gan-based anomaly detection," arXiv preprint arXiv:1802.06222, 2018.

[23]

R. Chalapathy and S. Chawla, "Deep learning for anomaly detection: A survey," arXiv preprint arXiv:1901.03407, 2019.

[24]

Y. LeCun, "The mnist database of handwritten digits," http://yann.lecun.com/exdb/mnist/, 1998.

[25]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading digits in natural images with unsupervised feature learning," 2011.

[26]

J. M. Zhang, M. Harman, L. Ma, and Y. Liu, "Machine learning testing: Survey, landscapes and horizons," IEEE Transactions on Software Engineering, 2020.

Digital Library

[27]

I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," arXiv preprint arXiv:1412.6572, 2014.

[28]

A. Kurakin, I. Goodfellow, and S. Bengio, "Adversarial examples in the physical world," arXiv preprint arXiv:1607.02533, 2016.

[29]

N. Carlini and D. Wagner, "Towards evaluating the robustness of neural networks," in 2017 ieee symposium on security and privacy (sp). IEEE, 2017, pp. 39--57.

[30]

E. J. Weyuker, "The evaluation of program-based software test data adequacy criteria," Communications of the ACM, vol. 31, no. 6, pp. 668--675, 1988.

Digital Library

[31]

L. Ma, F. Zhang, M. Xue, B. Li, Y. Liu, J. Zhao, and Y. Wang, "Combinatorial testing for deep learning systems," arXiv preprint arXiv:1806.07723, 2018.

[32]

A. Odena, C. Olsson, D. Andersen, and I. Goodfellow, "Tensorfuzz: Debugging neural networks with coverage-guided fuzzing," in International Conference on Machine Learning, 2019, pp. 4901--4911.

[33]

D. Hendrycks, M. Mazeika, and T. Dietterich, "Deep anomaly detection with outlier exposure," arXiv preprint arXiv:1812.04606, 2018.

[34]

A. Nguyen, J. Yosinski, and J. Clune, "Deep neural networks are easily fooled: High confidence predictions for unrecognizable images," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 427--436.

[35]

D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013.

[36]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in neural information processing systems, 2014, pp. 2672--2680.

Digital Library

[37]

A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves et al., "Conditional image generation with pixelcnn decoders," in Advances in neural information processing systems, 2016, pp. 4790--4798.

[38]

T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, "Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications," arXiv preprint arXiv:1701.05517, 2017.

[39]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.

[40]

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, "Striving for simplicity: The all convolutional net," arXiv preprint arXiv:1412.6806, 2014.

[41]

K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.

[42]

E. Nalisnick, A. Matsukawa, Y. W. Teh, D. Gorur, and B. Lakshminarayanan, "Do deep generative models know what they don't know?" arXiv preprint arXiv:1810.09136, 2018.

[43]

A. Krizhevsky, G. Hinton et al., "Learning multiple layers of features from tiny images," 2009.

[44]

J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan, "Likelihood ratios for out-of-distribution detection," in Advances in Neural Information Processing Systems, 2019, pp. 14707--14718.

Digital Library

[45]

H. Xiao, K. Rasul, and R. Vollgraf. (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.

[46]

M. Rosca, B. Lakshminarayanan, and S. Mohamed, "Distribution matching in variational inference," arXiv preprint arXiv:1802.06847, 2018.

Cited By

Dola SMcDaniel RDwyer MSoffa MRoychoudhury APaiva AAbreu RStorey M(2024)CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639106(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639106
Smith ACatanzaro MAngeloro GPatel NBendich POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Topological parallaxProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667345(28155-28172)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667345
Huang WZhao XBanks ACox VHuang X(2023)Hierarchical Distribution-aware Testing of Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362529033:2(1-35)Online publication date: 24-Sep-2023
https://dl.acm.org/doi/10.1145/3625290
Show More Cited By

Index Terms

Distribution-Aware Testing of Neural Networks Using Generative Models
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software organization and properties

Index terms have been assigned to the content through auto-classification.

Recommendations

Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network Testing
Testing deep neural networks (DNNs) has garnered great interest in the recent years due to their use in many applications. Black-box test adequacy measures are useful for guiding the testing process in covering the input domain. However, the absence of ...
CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial Testing
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Deep neural networks (DNN) are being used in a wide range of applications including safety-critical systems. Several DNN test generation approaches have been proposed to generate fault-revealing test inputs. However, the existing test generation ...
Covering code behavior on input validation in functional testing

Input validation is the enforcement built in software systems to ensure that only valid input is accepted to raise external effects. It is essential and very important to a large class of systems and usually forms a major part of a data-intensive ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

May 2021

1768 pages

ISBN:9781450390859

Sponsors

Publisher

IEEE Press

Publication History

Published: 05 November 2021

Check for updates

Badges

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICSE '21

Sponsor:

SIGSOFT

ICSE '21: 43rd International Conference on Software Engineering

May 22 - 30, 2021

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
98
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)4

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dola SMcDaniel RDwyer MSoffa MRoychoudhury APaiva AAbreu RStorey M(2024)CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639106(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639106
Smith ACatanzaro MAngeloro GPatel NBendich POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Topological parallaxProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667345(28155-28172)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667345
Huang WZhao XBanks ACox VHuang X(2023)Hierarchical Distribution-aware Testing of Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362529033:2(1-35)Online publication date: 24-Sep-2023
https://dl.acm.org/doi/10.1145/3625290
Weiss MTonella P(2023)Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/361759333:1(1-29)Online publication date: 23-Nov-2023
https://dl.acm.org/doi/10.1145/3617593
Hu QGuo YXie XCordy MPapadakis MLe Traon Y(2023)LaF: Labeling-free Model Selection for Automated Deep Neural Network ReusingACM Transactions on Software Engineering and Methodology10.1145/361166633:1(1-28)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.1145/3611666
Wang LXie XDu XTian MGuo QYang ZShen CChandra SBlincoe KTonella P(2023)DistXplore: Distribution-Guided Testing for Evaluating and Enhancing Deep Learning SystemsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616266(68-80)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616266
Dola SDwyer MSoffa M(2023)Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network TestingACM Transactions on Software Engineering and Methodology10.1145/357604032:3(1-48)Online publication date: 26-Apr-2023
https://dl.acm.org/doi/10.1145/3576040
Zohdinasab TRiccio VGambi ATonella P(2023)Efficient and Effective Feature Space Exploration for Testing Deep Learning SystemsACM Transactions on Software Engineering and Methodology10.1145/354479232:2(1-38)Online publication date: 29-Mar-2023
https://dl.acm.org/doi/10.1145/3544792
von Stein MShriver DElbaum S(2023)DeepManeuver: Adversarial Test Generation for Trajectory Manipulation of Autonomous VehiclesIEEE Transactions on Software Engineering10.1109/TSE.2023.330144349:10(4496-4509)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3301443
Goodloe A(2023)Assuring Safety-Critical Machine Learning-Enabled Systems: Challenges and PromiseComputer10.1109/MC.2023.326686056:9(83-88)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/MC.2023.3266860
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents