Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3395363.3397357acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

Published: 18 July 2020 Publication History

Abstract

Deep neural networks (DNN) have been deployed in many software systems to assist in various classification tasks. In company with the fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance.
To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity, which allows us to quickly identify possibly-misclassified tests. To evaluate, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experimental results demonstrate that DeepGini outperforms existing coverage-based techniques in prioritizing tests regarding both effectiveness and efficiency. Meanwhile, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques.

References

[1]
Ken Binmore and Joan Davies. 2002. Calculus: concepts and methods. Cambridge University Press.
[2]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 ( 2016 ).
[3]
Timothy Alan Budd. 1981. Mutation Analysis of Program Test Data. ( 1981 ).
[4]
Taejoon Byun, Vaibhav Sharma, Abhishek Vijayakumar, Sanjai Rayadurgam, and Darren Cofer. 2019. Input prioritization for testing neural networks. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). IEEE, 63-70.
[5]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 39-57.
[6]
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Cliford Stein. 2009. Introduction to algorithms. MIT press.
[7]
Alex Davies. [n. d.]. Tesla's Latest Autopilot Death Looks Just Like a Prior Crash. Available at https://www.wired.com/story/teslas-latest-autopilot-death-lookslike-prior-crash/ ( 2020 /01/27). ([n. d.]).
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248-255.
[9]
John S Denker and Yann Lecun. 1991. Transforming neural-net output levels to probability distributions. In Advances in neural information processing systems. 853-859.
[10]
Daniel Di Nardo, Nadia Alshahwan, Lionel Briand, and Yvan Labiche. 2013. Coverage-based test case prioritisation: An industrial case study. In Software Testing, Verification and Validation (ICST), 2013 IEEE Sixth International Conference on. IEEE, 302-311.
[11]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In Proceedings of 2015 3rd International Conference on Learning Representations (ICLR).
[12]
Mary Jean Harrold. 1999. Testing evolving software. Journal of Systems and Software 47, 2-3 ( 1999 ), 173-181.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.
[14]
James A Jones and Mary Jean Harrold. 2003. Test-suite reduction and prioritization for modified condition/decision coverage. IEEE Transactions on software Engineering 29, 3 ( 2003 ), 195-209.
[15]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE '19). IEEE Press, 1039-1049.
[16]
Bogdan Korel, George Koutsogiannakis, and Luay H Tahat. 2007. Model-based test prioritization heuristic methods and their evaluation. In Proceedings of the 3rd international workshop on Advances in model-based testing. ACM, 34-43.
[17]
Bogdan Korel, George Koutsogiannakis, and Luay H Tahat. 2008. Application of system models in regression test suite prioritization. In Software Maintenance, 2008. ICSM 2008. IEEE International Conference on. IEEE, 247-256.
[18]
Bogdan Korel, Luay Ho Tahat, and Mark Harman. 2005. Test prioritization using system models. In Software Maintenance, 2005. ICSM'05. Proceedings of the 21st IEEE International Conference on. IEEE, 559-568.
[19]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial Examples in the Physical World. In Proceedings of 2017 5th International Conference on Learning Representations (ICLR).
[20]
David Leon and Andy Podgurski. 2003. A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases. In 2003 IEEE 14th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 442.
[21]
Zenan Li, Xiaoxing Ma, Chang Xu, and Chun Cao. 2019. Structural coverage criteria for neural networks could be misleading. In 2019 IEEE/ACM 41st International Conference on Software Engineering : New Ideas and Emerging Results (ICSE-NIER). IEEE, 89-92.
[22]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 120-131.
[23]
L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao, and Y. Wang. 2018. DeepMutation: Mutation Testing of Deep Learning Systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100-111.
[24]
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on. IEEE, 372-387.
[25]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1-18.
[26]
J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 ( 1986 ), 81-106.
[27]
Laura Elena Raileanu and Kilian Stofel. 2004. Theoretical comparison between the gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41, 1 ( 2004 ), 77-93.
[28]
R Tyrrell Rockafellar. 1993. Lagrange multipliers and optimality. SIAM review 35, 2 ( 1993 ), 183-238.
[29]
Gregg Rothermel and Mary Jean Harrold. 1996. Analyzing regression test selection techniques. IEEE Transactions on software engineering 22, 8 ( 1996 ), 529-551.
[30]
Gregg Rothermel, Roland H Untch, Chengyun Chu, and Mary Jean Harrold. 1999. Test case prioritization: An empirical study. In Software Maintenance, 1999. (ICSM'99) Proceedings. IEEE International Conference on. IEEE, 179-188.
[31]
Gregg Rothermel, Roland H. Untch, Chengyun Chu, and Mary Jean Harrold. 2001. Prioritizing test cases for regression testing. IEEE Transactions on software engineering 27, 10 ( 2001 ), 929-948.
[32]
Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison.
[33]
Claude Elwood Shannon. 1948. A mathematical theory of communication. Bell system technical journal 27, 3 ( 1948 ), 379-423.
[34]
Mark Sherrif, Mike Lake, and Laurie Williams. 2007. Prioritization of regression tests using singular value decomposition with empirical change records. In Software Reliability, 2007. ISSRE'07. The 18th IEEE International Symposium on. IEEE, 81-90.
[35]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 ( 2016 ), 484.
[36]
Jack Stewart. [n. d.]. Tesla's Autopilot Was Involved in Another Deadly Car Crash. Available at https://www.wired.com/story/tesla-autopilot-self-drivingcrash-california/ ( 2020 /01/27). ([n. d.]).
[37]
Youcheng Sun, Xiaowei Huang, and Daniel Kroening. 2018. Testing Deep Neural Networks. arXiv preprint arXiv: 1803. 04792 ( 2018 ).
[38]
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. arXiv preprint arXiv: 1805. 00089 ( 2018 ).
[39]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303-314.
[40]
Paolo Tonella, Paolo Avesani, and Angelo Susi. 2006. Using the case-based ranking methodology for test case prioritization. In Software Maintenance, 2006. ICSM' 06. 22nd IEEE International Conference on. IEEE, 123-133.
[41]
Matt P Wand and M Chris Jones. [n. d.]. Kernel Smoothing. CRC Press.
[42]
Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2018. Feature-Guided Black-Box Safety Testing of Deep Neural Networks. In Tools and Algorithms for the Construction and Analysis of Systems. Springer, 408-426.
[43]
Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
[44]
W Eric Wong, Joseph R Horgan, Saul London, and Aditya P Mathur. 1998. Efect of test set minimization on fault detection efectiveness. Software: Practice and Experience 28, 4 ( 1998 ), 347-369.
[45]
Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geofrey Zweig. 2016. Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256 ( 2016 ).
[46]
Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software Testing, Verification and Reliability 22, 2 ( 2012 ), 67-120.
[47]
Shin Yoo, Mark Harman, Paolo Tonella, and Angelo Susi. 2009. Clustering test cases to achieve efective and scalable prioritisation incorporating expert knowledge. In Proceedings of the eighteenth international symposium on Software testing and analysis. ACM, 201-212.
[48]
Long Zhang, Xuechao Sun, Yong Li, and Zhenyu Zhang. 2019. A noise-sensitivityanalysis-based test prioritization technique for deep neural networks. arXiv preprint arXiv:1901. 00054 ( 2019 ).
[49]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 132-142.
[50]
Chris Ziegler. [n. d.]. A Google self-driving car caused a crash for the first time. Available at https://www.theverge.com/ 2016 /2/29/11134344/google-self-drivingcar-crash-report ( 2020 /01/27). ([n. d.]).

Cited By

View all
  • (2024)DeepLogic: Priority Testing of Deep Learning Through Interpretable Logic UnitsChinese Journal of Electronics10.23919/cje.2022.00.45133:4(948-964)Online publication date: Jul-2024
  • (2024)Semantic feature-based test selection for deep neural networks: A frequency domain perspectiveComputer Science and Information Systems10.2298/CSIS230907045J21:4(1499-1522)Online publication date: 2024
  • (2024)Prioritizing Test Inputs for DNNs Using Training DynamicsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695498(1219-1231)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
    July 2020
    591 pages
    ISBN:9781450380089
    DOI:10.1145/3395363
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep Learning
    2. Deep Learning Testing
    3. Test Case Prioritization

    Qualifiers

    • Research-article

    Conference

    ISSTA '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)232
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)DeepLogic: Priority Testing of Deep Learning Through Interpretable Logic UnitsChinese Journal of Electronics10.23919/cje.2022.00.45133:4(948-964)Online publication date: Jul-2024
    • (2024)Semantic feature-based test selection for deep neural networks: A frequency domain perspectiveComputer Science and Information Systems10.2298/CSIS230907045J21:4(1499-1522)Online publication date: 2024
    • (2024)Prioritizing Test Inputs for DNNs Using Training DynamicsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695498(1219-1231)Online publication date: 27-Oct-2024
    • (2024)FAST: Boosting Uncertainty-based Test Prioritization Methods for Neural Networks via Feature SelectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695472(895-906)Online publication date: 27-Oct-2024
    • (2024)Context-Aware Fuzzing for Robustness Enhancement of Deep Learning ModelsACM Transactions on Software Engineering and Methodology10.1145/368046434:1(1-68)Online publication date: 24-Jul-2024
    • (2024)GIST: Generated Inputs Sets Transferability in Deep LearningACM Transactions on Software Engineering and Methodology10.1145/367245733:8(1-38)Online publication date: 13-Jun-2024
    • (2024)Neuron Sensitivity-Guided Test Case SelectionACM Transactions on Software Engineering and Methodology10.1145/367245433:7(1-32)Online publication date: 12-Jun-2024
    • (2024)Keeper: Automated Testing and Fixing of Machine Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/367245133:7(1-33)Online publication date: 13-Jun-2024
    • (2024)Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/367244633:7(1-28)Online publication date: 13-Jun-2024
    • (2024)ObjTest: Object-Level Mutation for Testing Object Detection SystemsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671400(61-70)Online publication date: 24-Jul-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media