Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICSE43902.2021.00032acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Distribution-Aware Testing of Neural Networks Using Generative Models

Published: 05 November 2021 Publication History
  • Get Citation Alerts
  • Abstract

    The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics.
    To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.

    References

    [1]
    M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, "End to end learning for self-driving cars," CoRR, vol. abs/1604.07316, 2016. [Online]. Available: http://arxiv.org/abs/1604.07316
    [2]
    S. Pendleton, H. Andersen, X. Du, X. Shen, M. Meghjani, Y. Eng, D. Rus, and M. Ang, "Perception, planning, control, and coordination for autonomous vehicles," Machines, vol. 5, no. 1, p. 6, 2017.
    [3]
    N. Smolyanskiy, A. Kamenev, J. Smith, and S. Birchfield, "Toward low-flying autonomous mav trail navigation using deep neural networks for environmental awareness," in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017, pp. 4241--4247.
    [4]
    A. Loquercio, A. I. Maqueda, C. R. D. Blanco, and D. Scaramuzza, "Dronet: Learning to fly by driving," IEEE Robotics and Automation Letters, 2018.
    [5]
    K. Pei, Y. Cao, J. Yang, and S. Jana, "Deepxplore: Automated whitebox testing of deep learning systems," in proceedings of the 26th Symposium on Operating Systems Principles, 2017, pp. 1--18.
    [6]
    L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, Y. Liu et al., "Deepgauge: Multi-granularity testing criteria for deep learning systems," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 120--131.
    [7]
    Y. Sun, X. Huang, D. Kroening, J. Sharp, M. Hill, and R. Ashmore, "Testing deep neural networks," arXiv preprint arXiv:1803.04792, 2018.
    [8]
    X. Xie, J. W. K. Ho, C. Murphy, G. E. Kaiser, B. Xu, and T. Y. Chen, "Testing and validating machine learning classifiers by metamorphic testing," J. Syst. Softw., vol. 84, no. 4, pp. 544--558, 2011. [Online]. Available: https://doi.org/10.1016/j.jss.2010.11.920
    [9]
    Y. Tian, K. Pei, S. Jana, and B. Ray, "Deeptest: Automated testing of deep-neural-network-driven autonomous cars," in Proceedings of the 40th international conference on software engineering, 2018, pp. 303--314.
    [10]
    X. Huang, D. Kroening, W. Ruan, J. Sharp, Y. Sun, E. Thamo, M. Wu, and X. Yi, "A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability," Computer Science Review, vol. 37, p. 100270, 2020.
    [11]
    J. H. Hayes and J. Offutt, "Input validation analysis and testing," Empirical Software Engineering, vol. 11, no. 4, pp. 493--522, 2006.
    [12]
    N. Li, T. Xie, M. Jin, and C. Liu, "Perturbation-based user-input-validation testing of web applications," Journal of Systems and Software, vol. 83, no. 11, pp. 2263--2274, 2010.
    [13]
    H. Liu and H. B. K. Tan, "Covering code behavior on input validation in functional testing," Information and Software Technology, vol. 51, no. 2, pp. 546--553, 2009.
    [14]
    K. Taneja, N. Li, M. R. Marri, T. Xie, and N. Tillmann, "Mitv: multiple-implementation testing of user-input validators for web applications," in Proceedings of the IEEE/ACM international conference on Automated software engineering, 2010, pp. 131--134.
    [15]
    S. Sinha and M. J. Harrold, "Analysis and testing of programs with exception handling constructs," IEEE Transactions on Software Engineering, vol. 26, no. 9, pp. 849--871, 2000.
    [16]
    P. Zhang and S. Elbaum, "Amplifying tests to validate exception handling code: An extended study in the mobile application domain," ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 23, no. 4, pp. 1--28, 2014.
    [17]
    A. Goffi, A. Gorla, M. D. Ernst, and M. Pezzè, "Automatic generation of oracles for exceptional behaviors," in Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016, pp. 213--224.
    [18]
    Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, and D. Kroening, "Concolic testing for deep neural networks," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 109--119.
    [19]
    J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun, "Dlfuzz: Differential fuzzing testing of deep learning systems," in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 739--743.
    [20]
    J. An and S. Cho, "Variational autoencoder based anomaly detection using reconstruction probability," Special Lecture on IE, vol. 2, no. 1, 2015.
    [21]
    H. Xu, W. Chen, N. Zhao, Z. Li, J. Bu, Z. Li, Y. Liu, Y. Zhao, D. Pei, Y. Feng et al., "Unsupervised anomaly detection via variational autoencoder for seasonal kpis in web applications," in Proceedings of the 2018 World Wide Web Conference, 2018, pp. 187--196.
    [22]
    H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, "Efficient gan-based anomaly detection," arXiv preprint arXiv:1802.06222, 2018.
    [23]
    R. Chalapathy and S. Chawla, "Deep learning for anomaly detection: A survey," arXiv preprint arXiv:1901.03407, 2019.
    [24]
    Y. LeCun, "The mnist database of handwritten digits," http://yann.lecun.com/exdb/mnist/, 1998.
    [25]
    Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading digits in natural images with unsupervised feature learning," 2011.
    [26]
    J. M. Zhang, M. Harman, L. Ma, and Y. Liu, "Machine learning testing: Survey, landscapes and horizons," IEEE Transactions on Software Engineering, 2020.
    [27]
    I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," arXiv preprint arXiv:1412.6572, 2014.
    [28]
    A. Kurakin, I. Goodfellow, and S. Bengio, "Adversarial examples in the physical world," arXiv preprint arXiv:1607.02533, 2016.
    [29]
    N. Carlini and D. Wagner, "Towards evaluating the robustness of neural networks," in 2017 ieee symposium on security and privacy (sp). IEEE, 2017, pp. 39--57.
    [30]
    E. J. Weyuker, "The evaluation of program-based software test data adequacy criteria," Communications of the ACM, vol. 31, no. 6, pp. 668--675, 1988.
    [31]
    L. Ma, F. Zhang, M. Xue, B. Li, Y. Liu, J. Zhao, and Y. Wang, "Combinatorial testing for deep learning systems," arXiv preprint arXiv:1806.07723, 2018.
    [32]
    A. Odena, C. Olsson, D. Andersen, and I. Goodfellow, "Tensorfuzz: Debugging neural networks with coverage-guided fuzzing," in International Conference on Machine Learning, 2019, pp. 4901--4911.
    [33]
    D. Hendrycks, M. Mazeika, and T. Dietterich, "Deep anomaly detection with outlier exposure," arXiv preprint arXiv:1812.04606, 2018.
    [34]
    A. Nguyen, J. Yosinski, and J. Clune, "Deep neural networks are easily fooled: High confidence predictions for unrecognizable images," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 427--436.
    [35]
    D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013.
    [36]
    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in neural information processing systems, 2014, pp. 2672--2680.
    [37]
    A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves et al., "Conditional image generation with pixelcnn decoders," in Advances in neural information processing systems, 2016, pp. 4790--4798.
    [38]
    T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, "Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications," arXiv preprint arXiv:1701.05517, 2017.
    [39]
    Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.
    [40]
    J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, "Striving for simplicity: The all convolutional net," arXiv preprint arXiv:1412.6806, 2014.
    [41]
    K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [42]
    E. Nalisnick, A. Matsukawa, Y. W. Teh, D. Gorur, and B. Lakshminarayanan, "Do deep generative models know what they don't know?" arXiv preprint arXiv:1810.09136, 2018.
    [43]
    A. Krizhevsky, G. Hinton et al., "Learning multiple layers of features from tiny images," 2009.
    [44]
    J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan, "Likelihood ratios for out-of-distribution detection," in Advances in Neural Information Processing Systems, 2019, pp. 14707--14718.
    [45]
    H. Xiao, K. Rasul, and R. Vollgraf. (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.
    [46]
    M. Rosca, B. Lakshminarayanan, and S. Mohamed, "Distribution matching in variational inference," arXiv preprint arXiv:1802.06847, 2018.

    Cited By

    View all
    • (2024)CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639106(1-13)Online publication date: 20-May-2024
    • (2023)Topological parallaxProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667345(28155-28172)Online publication date: 10-Dec-2023
    • (2023)Hierarchical Distribution-aware Testing of Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362529033:2(1-35)Online publication date: 24-Sep-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '21: Proceedings of the 43rd International Conference on Software Engineering
    May 2021
    1768 pages
    ISBN:9781450390859

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 05 November 2021

    Check for updates

    Badges

    Author Tags

    1. deep learning
    2. deep neural networks
    3. input validation
    4. test coverage
    5. test generation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICSE '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial TestingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639106(1-13)Online publication date: 20-May-2024
    • (2023)Topological parallaxProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667345(28155-28172)Online publication date: 10-Dec-2023
    • (2023)Hierarchical Distribution-aware Testing of Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362529033:2(1-35)Online publication date: 24-Sep-2023
    • (2023)Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/361759333:1(1-29)Online publication date: 23-Nov-2023
    • (2023)LaF: Labeling-free Model Selection for Automated Deep Neural Network ReusingACM Transactions on Software Engineering and Methodology10.1145/361166633:1(1-28)Online publication date: 31-Jul-2023
    • (2023)DistXplore: Distribution-Guided Testing for Evaluating and Enhancing Deep Learning SystemsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616266(68-80)Online publication date: 30-Nov-2023
    • (2023)Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network TestingACM Transactions on Software Engineering and Methodology10.1145/357604032:3(1-48)Online publication date: 26-Apr-2023
    • (2023)Efficient and Effective Feature Space Exploration for Testing Deep Learning SystemsACM Transactions on Software Engineering and Methodology10.1145/354479232:2(1-38)Online publication date: 29-Mar-2023
    • (2023)DeepManeuver: Adversarial Test Generation for Trajectory Manipulation of Autonomous VehiclesIEEE Transactions on Software Engineering10.1109/TSE.2023.330144349:10(4496-4509)Online publication date: 1-Oct-2023
    • (2023)Assuring Safety-Critical Machine Learning-Enabled Systems: Challenges and PromiseComputer10.1109/MC.2023.326686056:9(83-88)Online publication date: 1-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media