Dlfuzz: Differential fuzzing testing of deep learning systems
Proceedings of the 2018 26th ACM Joint Meeting on European Software …, 2018•dl.acm.org
Deep learning (DL) systems are increasingly applied to safety-critical domains such as
autonomous driving cars. It is of significant importance to ensure the reliability and
robustness of DL systems. Existing testing methodologies always fail to include rare inputs
in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz,
the first differential fuzzing testing framework to guide DL systems exposing incorrect
behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and …
autonomous driving cars. It is of significant importance to ensure the reliability and
robustness of DL systems. Existing testing methodologies always fail to include rare inputs
in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz,
the first differential fuzzing testing framework to guide DL systems exposing incorrect
behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and …
Deep learning (DL) systems are increasingly applied to safety-critical domains such as autonomous driving cars. It is of significant importance to ensure the reliability and robustness of DL systems. Existing testing methodologies always fail to include rare inputs in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz, the first differential fuzzing testing framework to guide DL systems exposing incorrect behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and the prediction difference between the original input and the mutated input, without manual labeling effort or cross-referencing oracles from other DL systems with the same functionality. We present empirical evaluations on two well-known datasets to demonstrate its efficiency. Compared with DeepXplore, the state-of-the-art DL whitebox testing framework, DLFuzz does not require extra efforts to find similar functional DL systems for cross-referencing check, but could generate 338.59% more adversarial inputs with 89.82% smaller perturbations, averagely obtain 2.86% higher neuron coverage, and save 20.11% time consumption.
ACM Digital Library