research-article

DeepHunter: a coverage-guided fuzz testing framework for deep neural networks

Authors:

Felix Juefei-Xu,

Simon SeeAuthors Info & Claims

ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 146 - 157

https://doi.org/10.1145/3293882.3330579

Published: 10 July 2019 Publication History

Abstract

The past decade has seen the great potential of applying deep neural network (DNN) based software to safety-critical scenarios, such as autonomous driving. Similar to traditional software, DNNs could exhibit incorrect behaviors, caused by hidden defects, leading to severe accidents and losses. In this paper, we propose DeepHunter, a coverage-guided fuzz testing framework for detecting potential defects of general-purpose DNNs. To this end, we first propose a metamorphic mutation strategy to generate new semantically preserved tests, and leverage multiple extensible coverage criteria as feedback to guide the test generation. We further propose a seed selection strategy that combines both diversity-based and recency-based seed selection. We implement and incorporate 5 existing testing criteria and 4 seed selection strategies in DeepHunter. Large-scale experiments demonstrate that (1) our metamorphic mutation strategy is useful to generate new valid tests with the same semantics as the original seed, by up to a 98% validity ratio; (2) the diversity-based seed selection generally weighs more than recency-based seed selection in boosting the coverage and in detecting defects; (3) DeepHunter outperforms the state of the arts by coverage as well as the quantity and diversity of defects identified; (4) guided by corner-region based criteria, DeepHunter is useful to capture defects during the DNN quantization for platform migration.

References

[1]

2018. libFuzzer. (2018). https://llvm.org/docs/LibFuzzer.html 2019. DeepHunter. (2019). https://sites.google.com/view/deephunter

[2]

Craig S. Smith. 2018. Alexa and Siri Can Hear This Hidden Command. You Can’t. (2018).

[3]

https://www.nytimes.com/2018/05/10/technology/alexa-sirihidden-command-audio-attacks.html

[4]

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283.

Digital Library

[5]

Amazon. 2018. Amazon Alexa. (2018). https://developer.amazon.com/zh/alexa

[6]

Paul Ammann and Jeff Offutt. 2016. Introduction to Software Testing (2nd ed.). Cambridge University Press, New York, NY, USA.

Digital Library

[7]

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 387–402.

Digital Library

[8]

Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coveragebased Greybox Fuzzing As Markov Chain. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). ACM, New York, NY, USA, 1032–1043.

Digital Library

[9]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), IEEE Symposium on. 39–57.

[10]

Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong âĂę.

[11]

François Chollet and others. 2015. Keras. https://github.com/fchollet/keras. (2015).

[12]

Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In CVPR. 3642–3649.

Digital Library

[13]

Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016).

[14]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).

[15]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training Deep Neural Networks with Binary Weights During Propagations. In NIPS. 3105–3113.

Digital Library

[16]

Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. Deep-Stellar: Model-Based Quantitative Analysis of Stateful Deep Learning Systems. In The 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.

[17]

Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, and Zuoning Chen. 2018. CollAFL: Path Sensitive Fuzzing. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 679–696.

[18]

Divya Gopinath, Guy Katz, Corina S. Pasareanu, and Clark Barrett. 2017. Deep-Safe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks. CoRR abs/1710.00486 (2017). arXiv: 1710.00486 http://arxiv.org/abs/ 1710.00486

[19]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In ICLR. 1737–1746.

Digital Library

[20]

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, and others. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.

[21]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.

Digital Library

[22]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In CVPR.

[23]

F. Juefei-Xu, Vishnu Boddeti, and M. Savvides. 2017. Local Binary Convolutional Neural Networks. In CVPR. IEEE, 19–28.

[24]

F. Juefei-Xu, V. N. Boddeti, and M. Savvides. 2018. Perturbative Neural Networks. In CVPR. IEEE, 3310–3318.

[25]

Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. CoRR abs/1702.01135 (2017). arXiv: 1702.01135 http://arxiv.org/abs/1702.01135

[26]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing using Surprise Adequacy. In The 41st ACM/IEEE International Conference on Software Engineering.

Digital Library

[27]

Nair Krizhevsky, Hinton Vinod, Christopher Geoffrey, Mike Papadakis, and Anthony Ventresque. 2014. The cifar-10 dataset. http://www.cs.toronto.edu/kriz/ cifar.html. (2014).

[28]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradientbased learning applied to document recognition. Proc. of the IEEE 86, 11 (1998), 2278–2324.

[29]

Yann LeCun and Corrina Cortes. 1998. The MNIST database of handwritten digits. (1998).

[30]

L. Ma, F. Juefei-Xu, M. Xue, B. Li, L. Li, Y. Liu, and J. Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 614–618.

[31]

Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, and others. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 120–131.

Digital Library

[32]

Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and others. 2018. DeepMutation: Mutation Testing of Deep Learning Systems. The 29th IEEE International Symposium on Software Reliability Engineering (ISSRE) (2018).

[33]

Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, New York, NY, USA, 175–186.

Digital Library

[34]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, and others. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.

[35]

Augustus Odena and Ian Goodfellow. 2019. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. In Proceedings of the Thirty-sixth International Conference on Machine Learning.

[36]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18.

Digital Library

[37]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems. CoRR abs/1712.01785 (2017). arXiv: 1712.01785 http://arxiv.org/abs/1712.01785

[38]

Roger Pressman. 2010. Software Engineering: A Practitioner’s Approach (7 ed.). McGraw-Hill, Inc., New York, NY, USA.

Digital Library

[39]

Luca Pulina and Armando Tacchella. 2010. An Abstraction-Refinement Approach to Verification of Artificial Neural Networks. In International Conference on Computer Aided Verification. Springer, 243–257.

Digital Library

[40]

Nayan B. Ruparelia. 2010. Software Development Lifecycle Models. SIGSOFT Softw. Eng. Notes 35, 3 (May 2010), 8–13.

Digital Library

[41]

1764814

[42]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. IJCV 115, 3 (2015), 211–252.

Digital Library

[43]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, and others. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.

[44]

Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, New York, NY, USA, 109–119.

Digital Library

[45]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303–314.

Digital Library

[46]

Dana Drachsler-Cohen Petar Tsankov Swarat Chaudhuri Martin Vechev Timon Gehr, Matthew Mirman. 2018. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. In 2018 IEEE Symposium on Security and Privacy (SP).

[47]

Uber Accident. 2018. After Fatal Uber Crash, a Self-Driving Start-Up Moves Forward. (2018).

[48]

https://www.nytimes.com/2018/05/07/technology/uber-crashautonomous-driveai.html

[49]

Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2017. Feature-Guided Black-Box Safety Testing of Deep Neural Networks. CoRR abs/1710.07859 (2017).

[50]

arXiv: 1710.07859 http://arxiv.org/abs/1710.07859

[51]

Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680 (2018).

[52]

C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song. 2018. Generating Adversarial Examples with Adversarial Networks. ArXiv e-prints (Jan. 2018). arXiv: cs.CR/1801.02610 ISSTA ’19, July 15–19, 2019, Beijing, China X. Xie, L. Ma, F. Xu, M. Xue, H. Chen, Y. Liu, J. Zhao, B. Li, J. Yin, and S. See

Digital Library

[53]

Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019. DiffChaser: Detecting Disagreements for Deep Neural Networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence.

[54]

Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). ACM, New York, NY, USA, 132–142.

Digital Library

[55]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. 2017. Trained ternary quantization. ICLR (2017).

Cited By

Assiri FAljahdali A(2024)Software Vulnerability Fuzz Testing: A Mutation-Selection Optimization Systematic ReviewEngineering, Technology & Applied Science Research10.48084/etasr.697114:4(14961-14969)Online publication date: 2-Aug-2024
https://doi.org/10.48084/etasr.6971
Lian ZTian F(2024)DeepSI: A Sensitive-Driven Testing Samples Generation Method of Whitebox CNN Model for Edge ComputingTsinghua Science and Technology10.26599/TST.2023.901005729:3(784-794)Online publication date: Jun-2024
https://doi.org/10.26599/TST.2023.9010057
Hwang HShin J(2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
https://doi.org/10.23919/ICACT60172.2024.10471933
Show More Cited By

Index Terms

DeepHunter: a coverage-guided fuzz testing framework for deep neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Neuron Semantic-Guided Test Generation for Deep Neural Networks Fuzzing
In recent years, significant progress has been made in testing methods for deep neural networks (DNNs) to ensure their correctness and robustness. Coverage-guided criteria, such as neuron-wise, layer-wise, and path-/trace-wise, have been proposed for DNN ...
JQF: coverage-guided property-based testing in Java
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

We present JQF, a platform for performing coverage-guided fuzz testing in Java. JQF is designed both for practitioners, who wish to find bugs in Java programs, as well as for researchers, who wish to implement new fuzzing algorithms.

Practitioners ...
Boosting the Revealing of Detected Violations in Deep Learning Testing: A Diversity-Guided Method
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Due to the ability to bypass the oracle problem, Metamorphic Testing (MT) has been a popular technique to test deep learning (DL) software. However, no work has taken notice of the prioritization for Metamorphic test case Pairs (MPs), which is quite ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2019

451 pages

ISBN:9781450362245

DOI:10.1145/3293882

General Chair:
Dongmei Zhang
Microsoft Research, China
,
Program Chair:
Anders Møller
Aarhus University, Denmark

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation Singapore

Conference

ISSTA '19

Sponsor:

SIGSOFT

ISSTA '19: 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 15 - 19, 2019

Beijing, China

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

274
Total Citations
View Citations
3,146
Total Downloads

Downloads (Last 12 months)519
Downloads (Last 6 weeks)46

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Assiri FAljahdali A(2024)Software Vulnerability Fuzz Testing: A Mutation-Selection Optimization Systematic ReviewEngineering, Technology & Applied Science Research10.48084/etasr.697114:4(14961-14969)Online publication date: 2-Aug-2024
https://doi.org/10.48084/etasr.6971
Lian ZTian F(2024)DeepSI: A Sensitive-Driven Testing Samples Generation Method of Whitebox CNN Model for Edge ComputingTsinghua Science and Technology10.26599/TST.2023.901005729:3(784-794)Online publication date: Jun-2024
https://doi.org/10.26599/TST.2023.9010057
Hwang HShin J(2024)Test Case Prioritization with Z-Score Based Neuron Coverage2024 26th International Conference on Advanced Communications Technology (ICACT)10.23919/ICACT60172.2024.10471933(23-28)Online publication date: 4-Feb-2024
https://doi.org/10.23919/ICACT60172.2024.10471933
Lee JChen SMordahl ALiu CYang WWei S(2024)Automated Testing Linguistic Capabilities of NLP ModelsACM Transactions on Software Engineering and Methodology10.1145/367245533:7(1-33)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3672455
Wan CLiu SXie SLiu YHoffmann HMaire MLu S(2024)Keeper: Automated Testing and Fixing of Machine Learning SoftwareACM Transactions on Software Engineering and Methodology10.1145/367245133:7(1-33)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672451
Wang ZXu SFan LCai XLi LLiu Z(2024)Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/367244633:7(1-28)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672446
McQueary WMim SRaihan MSmith JJohnson Bd'Amorim M(2024)Py-holmes: Causal Testing for Deep Neural Networks in PythonCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663807(602-606)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663807
Houdaille PKhelladi DCombemale BMussbacher Gd'Amorim M(2024)On Polyglot Program TestingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663787(507-511)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663787
Yuan YWang SSu ZChristakis MPradel M(2024)See the Forest, not Trees: Unveiling and Escaping the Pitfalls of Error-Triggering Inputs in Neural Network TestingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680385(1605-1617)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680385
Ma XWang YWang JXie XWu BLi SXu FWang QChristakis MPradel M(2024)Enhancing Multi-agent System Testing with Diversity-Guided Exploration and Adaptive Critical State ExploitationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680376(1491-1503)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680376
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents