research-article

DLFuzz: differential fuzzing testing of deep learning systems

Authors:

Jiaguang SunAuthors Info & Claims

ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 739 - 743

https://doi.org/10.1145/3236024.3264835

Published: 26 October 2018 Publication History

Abstract

Deep learning (DL) systems are increasingly applied to safety-critical domains such as autonomous driving cars. It is of significant importance to ensure the reliability and robustness of DL systems. Existing testing methodologies always fail to include rare inputs in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz, the first differential fuzzing testing framework to guide DL systems exposing incorrect behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and the prediction difference between the original input and the mutated input, without manual labeling effort or cross-referencing oracles from other DL systems with the same functionality. We present empirical evaluations on two well-known datasets to demonstrate its efficiency. Compared with DeepXplore, the state-of-the-art DL whitebox testing framework, DLFuzz does not require extra efforts to find similar functional DL systems for cross-referencing check, but could generate 338.59% more adversarial inputs with 89.82% smaller perturbations, averagely obtain 2.86% higher neuron coverage, and save 20.11% time consumption.

References

[1]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).

[2]

Yuanliang Chen, Yu Jiang, Jie Liang, Mingzhe Wang, and Xun Jiao. 2018. EnFuzz: From Ensemble Learning to Ensemble Fuzzing. arXiv preprint arXiv:1807.00182 (2018).

[3]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248–255.

[4]

Shixiang Gu and Luca Rigazio. 2014. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068 (2014).

[5]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[6]

Moritz Helmstaedter, Kevin L Briggman, Srinivas C Turaga, Viren Jain, H Sebastian Seung, and Winfried Denk. 2013. Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 500, 7461 (2013), 168.

[7]

Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety verification of deep neural networks. In International Conference on Computer Aided Verification. Springer, 3–29.

[8]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.

Digital Library

[9]

Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/ (1998).

[10]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278– 2324.

[11]

Jie Liang, Mingzhe Wang, Yuanliang Chen, Yu Jiang, and Renwei Zhang. 2018. Fuzz testing in practice: Obstacles and solutions. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 562–566.

[12]

Seyed Mohsen Moosavi Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1–18.

Digital Library

[14]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.

[15]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104– 3112.

Digital Library

[16]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).

[17]

https://arxiv.org/abs/1312.6199

[18]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303–314.

Digital Library

[19]

Mingzhe Wang, Jie Liang, Yuanliang Chen, Yu Jiang, Xun Jiao, Han Liu, Xibin Zhao, and Jiaguang Sun. 2018. SAFL: increasing and accelerating testing coverage with symbolic execution and guided fuzzing. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. ACM, 61–64.

Digital Library

[20]

Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2018. Feature-guided black-box safety testing of deep neural networks. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 408–426.

[21]

Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droidsec: deep learning in android malware detection. In ACM SIGCOMM Computer Communication Review, Vol. 44. ACM, 371–372.

Digital Library

Cited By

Zhang SWang XFeng LHuang SChen ZZhao Z(2025)DeepKernel: 2D-kernels clustering based mutant reduction for cost-effective deep learning model testingJournal of Systems and Software10.1016/j.jss.2024.112247219(112247)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2024.112247
Kalaee AParsa S(2025)Metamorphic testing of deep neural network-based autonomous driving systems using behavioural domain adequacyNeural Computing and Applications10.1007/s00521-024-10794-yOnline publication date: 23-Jan-2025
https://doi.org/10.1007/s00521-024-10794-y
Jiang ZLi HTian XWang R(2024)Semantic feature-based test selection for deep neural networks: A frequency domain perspectiveComputer Science and Information Systems10.2298/CSIS230907045J21:4(1499-1522)Online publication date: 2024
https://doi.org/10.2298/CSIS230907045J
Show More Cited By

Index Terms

DLFuzz: differential fuzzing testing of deep learning systems
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Is neuron coverage a meaningful measure for testing deep neural networks?
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Recent effort to test deep learning systems has produced an intuitive and compelling test criterion called neuron coverage (NC), which resembles the notion of traditional code coverage. NC measures the proportion of neurons activated in a neural network ...
Testing AI Systems Leveraging Graph Perturbation
FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering

Automated testing for emerging AI-enabled systems is challenging, because data is often highly structured, semantically rich, and continuously evolving. Fuzz testing has been proven to be highly effective; however, it is nontrivial to apply traditional ...
Fuzzing for CPS Mutation Testing
ASE '23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering

Mutation testing can help reduce the risks of releasing faulty software. For such reason, it is a desired practice for the development of embedded software running in safety-critical cyber-physical systems (CPS). Unfortunately, state-of-the-art test data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

October 2018

987 pages

ISBN:9781450355735

DOI:10.1145/3236024

General Chair:
Gary T. Leavens
University of Central Florida, USA
,
Program Chairs:
Alessandro Garcia
PUC-Rio, Brazil
,
Corina S. Păsăreanu
NASA Ames Research Center, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '18

Sponsor:

SIGSOFT

ESEC/FSE '18: 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 4 - 9, 2018

FL, Lake Buena Vista, USA

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

203
Total Citations
View Citations
1,846
Total Downloads

Downloads (Last 12 months)254
Downloads (Last 6 weeks)19

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang SWang XFeng LHuang SChen ZZhao Z(2025)DeepKernel: 2D-kernels clustering based mutant reduction for cost-effective deep learning model testingJournal of Systems and Software10.1016/j.jss.2024.112247219(112247)Online publication date: Jan-2025
https://doi.org/10.1016/j.jss.2024.112247
Kalaee AParsa S(2025)Metamorphic testing of deep neural network-based autonomous driving systems using behavioural domain adequacyNeural Computing and Applications10.1007/s00521-024-10794-yOnline publication date: 23-Jan-2025
https://doi.org/10.1007/s00521-024-10794-y
Jiang ZLi HTian XWang R(2024)Semantic feature-based test selection for deep neural networks: A frequency domain perspectiveComputer Science and Information Systems10.2298/CSIS230907045J21:4(1499-1522)Online publication date: 2024
https://doi.org/10.2298/CSIS230907045J
Ogrizović MDrašković DBojić D(2024)Quality assurance strategies for machine learning applications in big data analytics: an overviewJournal of Big Data10.1186/s40537-024-01028-y11:1Online publication date: 30-Oct-2024
https://doi.org/10.1186/s40537-024-01028-y
Shi JXiao ZShi HJiang YLi X(2024)QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsACM Transactions on Software Engineering and Methodology10.1145/368884034:2(1-32)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3688840
Huang LSun WYan MLiu ZLei YLo D(2024)Neuron Semantic-Guided Test Generation for Deep Neural Networks FuzzingACM Transactions on Software Engineering and Methodology10.1145/368883534:1(1-38)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.1145/3688835
Wang HWei ZZhou QChan W(2024)Context-Aware Fuzzing for Robustness Enhancement of Deep Learning ModelsACM Transactions on Software Engineering and Methodology10.1145/368046434:1(1-68)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3680464
Liao SShan C(2024)A PSO-based Method to Test Deep Learning Library at API LevelProceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering10.1145/3672758.3672777(117-130)Online publication date: 26-Jan-2024
https://dl.acm.org/doi/10.1145/3672758.3672777
Tambon FKhomh FAntoniol G(2024)GIST: Generated Inputs Sets Transferability in Deep LearningACM Transactions on Software Engineering and Methodology10.1145/367245733:8(1-38)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672457
Wang ZXu SFan LCai XLi LLiu Z(2024)Can Coverage Criteria Guide Failure Discovery for Image Classifiers? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/367244633:7(1-28)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672446
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten