Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3293882.3330579acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

DeepHunter: a coverage-guided fuzz testing framework for deep neural networks

Published: 10 July 2019 Publication History

Abstract

The past decade has seen the great potential of applying deep neural network (DNN) based software to safety-critical scenarios, such as autonomous driving. Similar to traditional software, DNNs could exhibit incorrect behaviors, caused by hidden defects, leading to severe accidents and losses. In this paper, we propose DeepHunter, a coverage-guided fuzz testing framework for detecting potential defects of general-purpose DNNs. To this end, we first propose a metamorphic mutation strategy to generate new semantically preserved tests, and leverage multiple extensible coverage criteria as feedback to guide the test generation. We further propose a seed selection strategy that combines both diversity-based and recency-based seed selection. We implement and incorporate 5 existing testing criteria and 4 seed selection strategies in DeepHunter. Large-scale experiments demonstrate that (1) our metamorphic mutation strategy is useful to generate new valid tests with the same semantics as the original seed, by up to a 98% validity ratio; (2) the diversity-based seed selection generally weighs more than recency-based seed selection in boosting the coverage and in detecting defects; (3) DeepHunter outperforms the state of the arts by coverage as well as the quantity and diversity of defects identified; (4) guided by corner-region based criteria, DeepHunter is useful to capture defects during the DNN quantization for platform migration.

References

[1]
2018. libFuzzer. (2018). https://llvm.org/docs/LibFuzzer.html 2019. DeepHunter. (2019). https://sites.google.com/view/deephunter
[2]
Craig S. Smith. 2018. Alexa and Siri Can Hear This Hidden Command. You Can’t. (2018).
[3]
https://www.nytimes.com/2018/05/10/technology/alexa-sirihidden-command-audio-attacks.html
[4]
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283.
[5]
Amazon. 2018. Amazon Alexa. (2018). https://developer.amazon.com/zh/alexa
[6]
Paul Ammann and Jeff Offutt. 2016. Introduction to Software Testing (2nd ed.). Cambridge University Press, New York, NY, USA.
[7]
Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 387–402.
[8]
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coveragebased Greybox Fuzzing As Markov Chain. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). ACM, New York, NY, USA, 1032–1043.
[9]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), IEEE Symposium on. 39–57.
[10]
Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong âĂę.
[11]
François Chollet and others. 2015. Keras. https://github.com/fchollet/keras. (2015).
[12]
Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In CVPR. 3642–3649.
[13]
Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016).
[14]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).
[15]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training Deep Neural Networks with Binary Weights During Propagations. In NIPS. 3105–3113.
[16]
Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. Deep-Stellar: Model-Based Quantitative Analysis of Stateful Deep Learning Systems. In The 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
[17]
Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, and Zuoning Chen. 2018. CollAFL: Path Sensitive Fuzzing. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 679–696.
[18]
Divya Gopinath, Guy Katz, Corina S. Pasareanu, and Clark Barrett. 2017. Deep-Safe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks. CoRR abs/1710.00486 (2017). arXiv: 1710.00486 http://arxiv.org/abs/ 1710.00486
[19]
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In ICLR. 1737–1746.
[20]
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, and others. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.
[21]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.
[22]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In CVPR.
[23]
F. Juefei-Xu, Vishnu Boddeti, and M. Savvides. 2017. Local Binary Convolutional Neural Networks. In CVPR. IEEE, 19–28.
[24]
F. Juefei-Xu, V. N. Boddeti, and M. Savvides. 2018. Perturbative Neural Networks. In CVPR. IEEE, 3310–3318.
[25]
Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. CoRR abs/1702.01135 (2017). arXiv: 1702.01135 http://arxiv.org/abs/1702.01135
[26]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing using Surprise Adequacy. In The 41st ACM/IEEE International Conference on Software Engineering.
[27]
Nair Krizhevsky, Hinton Vinod, Christopher Geoffrey, Mike Papadakis, and Anthony Ventresque. 2014. The cifar-10 dataset. http://www.cs.toronto.edu/kriz/ cifar.html. (2014).
[28]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradientbased learning applied to document recognition. Proc. of the IEEE 86, 11 (1998), 2278–2324.
[29]
Yann LeCun and Corrina Cortes. 1998. The MNIST database of handwritten digits. (1998).
[30]
L. Ma, F. Juefei-Xu, M. Xue, B. Li, L. Li, Y. Liu, and J. Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 614–618.
[31]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, and others. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 120–131.
[32]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and others. 2018. DeepMutation: Mutation Testing of Deep Learning Systems. The 29th IEEE International Symposium on Software Reliability Engineering (ISSRE) (2018).
[33]
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, New York, NY, USA, 175–186.
[34]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, and others. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[35]
Augustus Odena and Ian Goodfellow. 2019. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. In Proceedings of the Thirty-sixth International Conference on Machine Learning.
[36]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
[37]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems. CoRR abs/1712.01785 (2017). arXiv: 1712.01785 http://arxiv.org/abs/1712.01785
[38]
Roger Pressman. 2010. Software Engineering: A Practitioner’s Approach (7 ed.). McGraw-Hill, Inc., New York, NY, USA.
[39]
Luca Pulina and Armando Tacchella. 2010. An Abstraction-Refinement Approach to Verification of Artificial Neural Networks. In International Conference on Computer Aided Verification. Springer, 243–257.
[40]
Nayan B. Ruparelia. 2010. Software Development Lifecycle Models. SIGSOFT Softw. Eng. Notes 35, 3 (May 2010), 8–13.
[41]
[42]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. IJCV 115, 3 (2015), 211–252.
[43]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, and others. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.
[44]
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, New York, NY, USA, 109–119.
[45]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303–314.
[46]
Dana Drachsler-Cohen Petar Tsankov Swarat Chaudhuri Martin Vechev Timon Gehr, Matthew Mirman. 2018. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. In 2018 IEEE Symposium on Security and Privacy (SP).
[47]
Uber Accident. 2018. After Fatal Uber Crash, a Self-Driving Start-Up Moves Forward. (2018).
[48]
https://www.nytimes.com/2018/05/07/technology/uber-crashautonomous-driveai.html
[49]
Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2017. Feature-Guided Black-Box Safety Testing of Deep Neural Networks. CoRR abs/1710.07859 (2017).
[50]
arXiv: 1710.07859 http://arxiv.org/abs/1710.07859
[51]
Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680 (2018).
[52]
C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song. 2018. Generating Adversarial Examples with Adversarial Networks. ArXiv e-prints (Jan. 2018). arXiv: cs.CR/1801.02610 ISSTA ’19, July 15–19, 2019, Beijing, China X. Xie, L. Ma, F. Xu, M. Xue, H. Chen, Y. Liu, J. Zhao, B. Li, J. Yin, and S. See
[53]
Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019. DiffChaser: Detecting Disagreements for Deep Neural Networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence.
[54]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, New York, NY, USA, 132–142.
[55]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. 2017. Trained ternary quantization. ICLR (2017).

Cited By

View all
  • (2024)Software Vulnerability Fuzz Testing: A Mutation-Selection Optimization Systematic ReviewEngineering, Technology & Applied Science Research10.48084/etasr.697114:4(14961-14969)Online publication date: 2-Aug-2024
  • (2024)DeepSI: A Sensitive-Driven Testing Samples Generation Method of Whitebox CNN Model for Edge ComputingTsinghua Science and Technology10.26599/TST.2023.901005729:3(784-794)Online publication date: Jun-2024
  • (2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2019
451 pages
ISBN:9781450362245
DOI:10.1145/3293882
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning testing
  2. coverage-guided fuzzing
  3. metamorphic testing

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)519
  • Downloads (Last 6 weeks)46
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Software Vulnerability Fuzz Testing: A Mutation-Selection Optimization Systematic ReviewEngineering, Technology & Applied Science Research10.48084/etasr.697114:4(14961-14969)Online publication date: 2-Aug-2024
  • (2024)DeepSI: A Sensitive-Driven Testing Samples Generation Method of Whitebox CNN Model for Edge ComputingTsinghua Science and Technology10.26599/TST.2023.901005729:3(784-794)Online publication date: Jun-2024
  • (2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
  • (2024)Automated Testing Linguistic Capabilities of NLP ModelsACM Transactions on Software Engineering and Methodology10.1145/367245533:7(1-33)Online publication date: 14-Jun-2024
  • (2024)Keeper: Automated Testing and Fixing of Machine Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/367245133:7(1-33)Online publication date: 13-Jun-2024
  • (2024)Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/367244633:7(1-28)Online publication date: 13-Jun-2024
  • (2024)Py-holmes: Causal Testing for Deep Neural Networks in PythonCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663807(602-606)Online publication date: 10-Jul-2024
  • (2024)On Polyglot Program TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663787(507-511)Online publication date: 10-Jul-2024
  • (2024)See the Forest, not Trees: Unveiling and Escaping the Pitfalls of Error-Triggering Inputs in Neural Network TestingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680385(1605-1617)Online publication date: 11-Sep-2024
  • (2024)Enhancing Multi-agent System Testing with Diversity-Guided Exploration and Adaptive Critical State ExploitationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680376(1491-1503)Online publication date: 11-Sep-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media