IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
1199
Fuzzing: State of the Art
Hongliang Liang , Member, IEEE, Xiaoxiao Pei, Xiaodong Jia, Wuwei Shen, Member, IEEE,
and Jian Zhang, Senior Member, IEEE
Abstract—As one of the most popular software testing techniques, fuzzing can find a variety of weaknesses in a program, such
as software bugs and vulnerabilities, by generating numerous test
inputs. Due to its effectiveness, fuzzing is regarded as a valuable bug
hunting method. In this paper, we present an overview of fuzzing
that concentrates on its general process, as well as classifications,
followed by detailed discussion of the key obstacles and some stateof-the-art technologies which aim to overcome or mitigate these
obstacles. We further investigate and classify several widely used
fuzzing tools. Our primary goal is to equip the stakeholder with a
better understanding of fuzzing and the potential solutions for improving fuzzing methods in the spectrum of software testing and
security. To inspire future research, we also predict some future
directions with regard to fuzzing.
Index Terms—Fuzzing, reliability, security, software testing,
survey.
I. INTRODUCTION
UZZING (short for fuzz testing) is “an automatic testing
technique that covers numerous boundary cases using invalid data (from files, network protocols, application programming interface (API) calls, and other targets) as application input
to better ensure the absence of exploitable vulnerabilities” [1].
Fuzzing was first proposed by Miller et al. [2] in 1988, and
since then, it has developed into an effective, fast, and practical
method to find bugs in software [3]–[5]. The key idea behind
fuzzing is to generate and feed the target program with plenty of
test cases which are hopeful to trigger software errors. The most
creative part of fuzzing is the way to generate suitable test cases,
and current popular methods include coverage-guided strategies, genetic algorithm, symbolic execution, taint analysis, etc.
Based on these methods, a modern fuzzing technique is very intelligent to reveal hidden bugs. Therefore, as the unique testing
method for which test success can be quantified in meaningful software-quality terms, fuzzing has an important theoretical
F
Manuscript received November 1, 2017; revised February 21, 2018 and April
25, 2018; accepted May 5, 2018. Date of publication April 6, 2018; date of
current version August 30, 2018. This work was supported by the National
Natural Science Foundation of China (NSFC) under Grant U1713212 and
Grant 91418206 and the Key Research Program of Frontier Sciences, Chinese
Academy of Sciences, under Grant QYZDJ-SSW-JSC036. Associate Editor:
T. H. Tse. (Corresponding author: Hongliang Liang.)
H. Liang, X. Pei, and X. Jia are with the Beijing University of Posts
and Telecommunications, Beijing 100876, China (e-mail:,hliang@bupt.edu.cn;
calvin@bupt.edu.cn; rose11130@bupt.edu.cn).
W. Shen is with Western Michigan University, Kalamazoo, MI 49008 USA
(e-mail:,wuwei.shen@wmich.edu).
J. Zhang is with the Institute of Software, Chinese Academy of Sciences,
Beijing 100190, China (e-mail:,zj@ios.ac.cn).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TR.2018.2834476
and experimental role. It serves as the standard of comparison
by which other methods should be evaluated [6]. Furthermore,
fuzzing has gradually evolved into a synthesis technique to synergistically combine static and dynamic information of a target
program so that better test cases can be produced and more bugs
are detected [7].
Fuzzing simulates attacks by constantly sending malformed
or semivalid test cases to the target program. Thanks to these
irregular inputs, fuzzers, also named as fuzzing tools, can often
find out previously unknown vulnerabilities [8]–[11]. That is
one of the key reasons why fuzzing plays an important role in
software testing. However, the blindness during a process of test
case generation which can lead to low code coverage is the main
drawback that fuzzing has been trying to overcome. As mentioned above, several methods have been utilized to mitigate this
problem and fuzzing has made an impressive progress. Nowadays, fuzzing has been widely applied to test various software,
including compilers, applications, network protocols, kernels,
etc., and multiple application areas, such as evaluation of syntax error recovery [12] and fault localization [13].
A. Motivation
There are two reasons that motivate us to write this overview.
1) Fuzzing is getting more and more attention in the area of
software security and reliability because of its effective
ability to find bugs. Many IT companies such as Google
and Microsoft are studying fuzzing techniques and further
developing fuzzing tools (e.g., SAGE [14], Syzkaller [15],
SunDew [16], etc.) to find bugs in the target program.
2) There is no systematic review of fuzzing in the past
few years. Although some papers present overviews of
fuzzing, they are usually review of selected articles [1],
[17] or surveys on a specific testing topic [18], [19]. Therefore, we think it is necessary to write a comprehensive
review to summarize the latest methods and new research
results in this area. Thus, with this paper, we hope that
not only the beginners can get a general understanding
of fuzzing but also the professionals can have a thorough
review of fuzzing.
B. Outline
The rest of this paper is organized as follows: Section II
presents the review methodology used in our survey as well
as a brief summary and analysis of some selected papers.
Section III describes the general process of fuzzing. Section IV
introduces the classification of fuzzing methods. Section V
0018-9529 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1200
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
describes the state of the art in fuzzing. Section VI provides
several popular fuzzers classified by their application areas
and problem domains. As a result of our survey, a number of
research challenges are identified as avenues for future research
in Section VII. Finally, Section VIII concludes this paper.
TABLE I
PUBLISHERS AND NUMBER OF PRIMARY STUDIES
II. REVIEW METHOD
To conduct a comprehensive survey on fuzzing, we followed
a systematic and structured method inspired by the guidelines of
Kitchenham [20] and Webster and Watson [21]. In the following
sections, we will introduce our research methods, collected data,
and analysis in detail.
A. Research Questions
This survey mainly aims to answer the following research
questions about fuzzing.
1) RQ1: What are the key problems and the corresponding
techniques in fuzzing research?
2) RQ2: What are the usable fuzzers and their known application domains?
3) RQ3: What are the future research opportunities or directions?
RQ1, which is answered in Section V, allows us to explore an
in-depth view on fuzzing outlining the state-of-the-art advancement in this area since its original introduction. RQ2, which is
discussed in Section VI, is proposed to give an insight into the
scope of fuzzing and its applicability to different domains. Finally, based on the answer to the previous questions, we expect
to identify unresolved problems and research opportunities in
response to RQ3, which is answered in Section VII.
B. Inclusion and Exclusion Criteria
We scrutinized the existing literature in order to find papers
related to all aspects of fuzzing, such as methods, tools, applications to specific testing problems, empirical evaluations,
and surveys. Articles written by the same authors with similar
content were intentionally classified and evaluated as separate
contributions for a more rigorous analysis. Then, we grouped
these articles with no major differences in the presentation
of results. We excluded those papers based on the following
criteria:
1) not related to the computer science field;
2) not written in English;
3) not published by a reputed publisher;
4) published by a reputed publisher but with less than six
pages;
5) not accessible via the Web.
For instance, using the search interface of Wiley InterScience
website with keywords, such as fuzzing/fuzz testing/fuzzer, we
had 32 papers, seven of which are only related to computer
science field according to their abstract.
C. Source Material and Search Strategy
In order to provide a complete survey covering all the publications relating to fuzzing, we constructed a fuzzing publication
repository, which includes more than 350 papers from January
1990 to June 2017, via three steps. First, we searched some main
online repositories such as IEEE XPlore, ACM Digital Library,
Springer Online Library, Wiley InterScience, USENIX and Elsevier ScienceDirect Online Library, and collected papers with
either “fuzz testing,” “fuzzing,” “fuzzer,” “random testing,” or
“swarm testing” as keywords in their titles, abstracts, or keywords. Second, we used abstracts of the collected papers to
exclude some of them based on our selection criteria. We did
read through a paper if it cannot be decided by its abstract. This
step was carried out by two different authors. The set of candidate papers was reduced to 171 publications within the scope of
our survey. These papers are referred to as the primary studies
[20]. Table I presents the number of primary studies retrieved
from each source.
It is still possible for our search to not completely cover
all the related papers since we focused on a subset of reputed
publishers. However, we are confident that the overall trends in
this paper are accurate and provide a fair picture of the state of
the art on fuzzing.
D. Summary of Results
The following sections summarize our primary studies in
terms of publication trends, venues, authors, and geographical
distribution on fuzzing.
1) Publication Trends: Fig. 1(a) illustrates the number of
publications about fuzzing between January, 1990 and June 30,
2017. The graph shows that the number of papers on this topic
had a constant increase since 2004, especially after 2009. The
cumulative number of publications is illustrated in Fig. 1(b). We
calculated a close fit to a quadratic function with a high determination coefficient (R2 = 0.992), indicating a strong polynomial growth, a sign of continued health and interest on this
subject. If the trend continues, there will be more than 200
fuzzing papers published by reputed publishers by the end of
2018, three decades after this technique was first introduced.
2) Publication Venues: The 171 primary studies were published in 78 distinct venues. It means the fields covered by
fuzzing literature are very wide. It is probably because this
technique is very practical and has been applied to multiple
testing, reliability, and security domains. Regarding the type of
venues, most papers were presented at conferences and symposia (73%), followed by journals (15%), workshops (9%), and
LIANG et al.: FUZZING: STATE OF THE ART
1201
Fig. 1. Fuzzing papers published between January 1, 1990 and June 30, 2017. (a) Number of publications per year. (b) Cumulative number of publications
per year.
TABLE II
TOP VENUES ON FUZZING
TABLE III
GEOGRAPHICAL DISTRIBUTION OF PUBLICATIONS
TABLE IV
TOP 10 COAUTHORS ON FUZZING
technical reports (3%). Table II lists the venues where at least
three fuzzing papers have been presented.
3) Geographical Distribution of Publications: We related
the geographical origin of each primary study to the affiliation country of its first coauthor. Interestingly, we found that
all 171 primary studies originated from 22 different countries
with USA, China, and Germany being the top three, as presented in Table III (only the countries with over four papers).
By continents, 43% of the papers are originated from America, 32% from Europe, 20% from Asia, and 5% from Oceania. This suggests that the fuzzing community is formed by a
modest number of countries but fairly distributed around the
world.
4) Researchers and Organizations: We identified 125 distinct coauthors in the 171 primary studies under review.
Table IV presents the top authors on fuzzing and their most
recent affiliation.
III. GENERAL PROCESS OF FUZZING
Fuzzing is a software testing technique which can automatically generate test cases. Thus, we run these test cases on a
target program, and then observe the corresponding program
behavior to determine whether there is any bug or vulnerability
in the target program. The general process of fuzzing is shown
in Fig. 2.
1202
Fig. 2.
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
General process of fuzzing.
Target program: A target program is the program under test.
It could be either binary or source code. However, the source
code of real-world software usually cannot be accessed easily,
so fuzzers often target binary code.
Monitor: This component is generally built into a white-box
or gray-box fuzzer. A monitor leverages the techniques, such
as code instrumentation, taint analysis, etc., to acquire code
coverage, taint data flow or other useful runtime information of
the target program. A monitor is not necessary in a black-box
fuzzer.
Test case generator: There are two main methods for fuzzers
to generate test cases: mutation-based and grammar-based [1]
methods. The first method generates test inputs by mutating
well-formed seed files randomly or using predefined mutation strategies which can be adjusted based on target-programoriented information gathered during runtime. On the contrary,
the second method does not need any seed file. It generates inputs from a specification (e.g., grammar). In many cases, the
test cases in fuzzing usually are semivalid inputs which are
valid enough to pass the early parsing stage and invalid enough
to trigger bugs in the deep logic of the target program.
Bug detector: To help users find potential bugs in a target program, a bug detector module is designed and implemented in a fuzzer. When the target program crashes or reports
some errors, the bug detector module collects and analyzes related information (e.g., stack traces [22]) to decide whether a
bug exists. Sometimes, a debugger can be used manually to
record exception information [23]–[25] as an alternative of this
module.
Bug filter: Testers usually focus on correctness or security related bugs. Therefore, filtering exploitable bugs (i.e., vulnerabilities) from all the reported bugs is an important task and usually
performed manually [23], which is not only time consuming but
hard to tackle as well. Currently, some research works [26] have
proposed various approaches to mitigate this problem. For example, by sorting the fuzzer’s outputs (bug-inducing test cases),
the diverse, interesting test cases are prioritized, and testers do
not need to manually search for wanted bugs, which is a process
like looking for a needle in a haystack.
To explain the process of fuzzing more clearly, as an example, we use American fuzzy lop (AFL) [27], a mutation-based
coverage-guided fuzzer, to test png2swf, a file converter. In this
example, we provide the png2swf executable file for AFL as
its target program. First, we provide a few seed files (the ideal
seeds should be well formed and of small size) for AFL since
it adopts the mutation-based technique. Second, we run AFL
by a simple command (e.g., “afl-fuzz -i [input_directory] -o
[output_directory] -Q – [target_directory] (@@)”, if the target
program gets input from a file, then “@@” is necessary). During
the testing process, the “Monitor” component of AFL collects
specific runtime information (path-coverage information in this
case) by binary instrumentation, then passes these information
to the “Test Case Generator” component to help guide its test
case generation process. The general strategy is to save those
test cases, which are able to cover new program paths for the
next round of mutation, and discard those otherwise. Third, the
newly generated test cases are passed back to the “Monitor” as
the inputs of the target program. This process continues until we
stop the AFL instance or a given time limit is reached. Besides,
AFL also prints some useful information on the screen during
the runtime, such as the execution time, the number of unique
crashes, the execution speed, etc. So, we may get a set of test
cases which can crash the target program. Finally, we analyze
the test cases and identify those bugs that crash the target program manually or with the help of other tools. The bugs that AFL
finds are mainly relative to memory operations, like buffer overflow, access violation, stack smash, etc., which usually cause
the program crash or may be exploited by crackers.
IV. BLACK, WHITE, OR GRAY?
Fuzzing techniques can be divided into three kinds: black box,
white box, and gray box depending on how much information
they require from the target program at runtime [28]. The information can be code coverage, data-flow coverage, program’s
memory usage, CPU utilization, or any other information to
guide test case generation.
A. Black-Box Fuzzing
Traditional black-box fuzzing is also known as “black-box
random testing.” Instead of requiring any information from the
target program or input format, black-box random testing uses
some predefined rules to randomly mutate the given well-formed
seed file(s) to create malformed inputs. The mutation rules could
be bit flips, byte copies, or byte removals [28], etc. Recent
black-box fuzzing also utilizes grammar [29] or input-specific
knowledge [30] to generate semivalid inputs.
Black-box fuzzers, such as fuzz [31] and Trinity [32], are
popular in software industry due to its effectiveness in finding
bugs and simplicity for use. For instance, Trinity aims to fuzz
system call interfaces of the Linux kernel. Testers should describe the type of input by using the provided templates first.
LIANG et al.: FUZZING: STATE OF THE ART
Fig. 3.
Example function.
Then, Trinity can generate more valid inputs which can reach
higher coverage. This fuzzer has found plenty of bugs.1
However, the drawback of this technique is also obvious.
Consider the function shown in Fig. 3. The abort() function at
Line 7 has only 1/232 chance to be reached when the input parameter at line 1 is assigned randomly. This example intuitively
explains why it is difficult for black-box fuzzing to generate test
cases that cover large number of paths in a target program. Due
to this blindness nature, black-box fuzzing often provides low
code coverage in practice [33]. That is why recent fuzzer developers have mainly focused on reverse engineering [34], code
instrumentation [35], taint analysis [23], [36], [37], and other
techniques to make the fuzzer “smarter” in order to mitigate
this problem, which is why white-box and gray-box fuzzing
have received more attention recently.
B. White-Box Fuzzing
White-box fuzzing is based on the knowledge of internal logic
of the target program [17]. It uses a method which in theory can
explore all execution paths in the target program. It was first
proposed by Godefroid et al. [38]. In order to overcome the
blindness of black-box fuzzing, they dedicated to exploring an
alternative method and called it white-box fuzzing [14]. By using dynamic symbolic execution (also known as concolic execution [39]) and coverage-maximizing heuristic search algorithm,
white-box fuzzing can search the target program thoroughly and
quickly.
Unlike black-box fuzzing, white-box fuzzing requires information from the target program and uses the required information to guide test case generation. Specifically, starting execution with a given concrete input, a white-box fuzzer first gathers
symbolic constraints at all conditional statements along the execution path under the input. Therefore, after one execution, the
white-box fuzzer combines all symbolic constraints together using logic AND to form a path constraint (PC for short). Then, the
white-box fuzzer systematically negates one of the constraints
and solves the new PC. The new test case leads the program
to run a different execution path. Using a coverage-maximizing
1 http://codemonkey.org.uk/projects/trinity/bugs-found.php
1203
heuristic search algorithm, white-box fuzzers can find bugs in
the target program as fast as possible [38].
In theory, white-box fuzzing can generate test cases which
cover all the program paths. In practice, however, due to many
problems such as the numerous execution paths in a real software system and the imprecision of solving a constraint during
symbolic execution (see Section V-B for details), the code coverage of white-box fuzzing cannot achieve 100%. One of the
most famous white-box fuzzers is SAGE [14]. SAGE targets
large-scale Windows applications and uses several optimizations to deal with the huge number of execution traces. It can
automatically find software bugs and has achieved impressive
results.
C. Gray-Box Fuzzing
Gray-box fuzzing stands in the middle of black-box fuzzing
and white-box fuzzing to effectively reveal software errors with
partial knowledge of the target program. The commonly used
method in gray-box fuzzing is code instrumentation [40], [41].
By this means, a gray-box fuzzer can obtain code coverage of
the target program at runtime; then, it utilizes this information to
adjust (e.g., by using a genetic algorithm [28], [42]) its mutation
strategies to create test cases which may cover more execution
paths or find bugs faster. Another method used in gray-box
fuzzing is taint analysis [23], [43]–[45], which extends code instrumentation for tracing taint data flow. Therefore, a fuzzer can
focus on mutating specific fields of input which can influence
the potential attack points in the target program.
Gray-box and white-box fuzzing are quite similar in that both
methods make use of information of the target program to guide
test case generation. But there is also an obvious difference
between them: gray-box fuzzing only utilizes some runtime information (e.g., code coverage, taint data flow, etc.) of the target
program to decide which paths have been explored [28]. Also,
gray-box fuzzing only uses the acquired information to guide
test case generation, but it cannot guarantee that using this piece
of information will surely generate better test cases to cover new
paths or trigger specific bugs. By contrast, white-box fuzzing
utilizes source codes or binary codes of the target program to
systemically explore all the execution paths. By using concolic
execution and a constraint solver, white-box fuzzing can make
sure that the generated test cases will lead the target program
to explore new execution paths. Thus, white-box fuzzing helps
reducing the blindness during fuzzing process more thoroughly.
In summary, both methods make use of information of the target program to mitigate blindness of black-box fuzzing, but to
different degrees.
BuzzFuzz [46] is a good example of showing how gray-box
fuzzer works, although the developer of BuzzFuzz called it a
white-box fuzzer. We regard this tool as a gray-box fuzzer, because it only acquires partial knowledge (taint data flow) of the
target program. BuzzFuzz works as follows. First, it instruments
the target program to trace tainted well-formed input data. Then,
based on the collected taint propagation information, it calculates which part of the input data may influence the predefined
attack points (BuzzFuzz regards lib calls as potential attack
1204
points) in the target program. Next, it mutates sensitive parts of
the input data to create new test cases. Finally, it executes the
new test cases and observes whether the target program crashes
or not. By this means, BuzzFuzz can find bugs in a more targetoriented and effective manner. More importantly, attack points
can be defined as specific lib calls, or vulnerability patterns, etc.,
depending on a developer’s concern.
D. How to Choose?
According to the possibilities of being triggered or found,
bugs can be classified into two categories: “shallow” bugs and
“hidden” bugs. The bugs that cause the target program crash
in the early stage of execution are regarded as “shallow” bugs,
e.g., a potential divide-by-zero operation without any precedent
conditional branch. On the contrary, the bugs which exist deeply
in the program logic and are hard to trigger are regarded as
“hidden” bugs, such as bugs existing in complex conditional
branches. There is no standard way to identify “shallow” and
“hidden” bugs; therefore, the commonly used evaluation criteria
of a fuzzer are the code coverage it achieves, the number and
exploitability of bugs it finds. In general, a traditional black-box
fuzzer, which only uses random mutation method to generate
test cases, cannot reach high code coverage and thus usually
finds shallow bugs; however, it is lightweight, fast, and easy to
use. By comparison, white-box or gray-box fuzzers can achieve
higher code coverage and usually find more hidden bugs than
black-box fuzzers, but these fuzzers are more complicated to
build and the fuzzing processes are more time consuming than
black-box fuzzers.
Fuzzing techniques that only use simple mutation methods
are usually regarded as “dumb” fuzzing, such as the famous
“5-lines of Python” method used by Charlie Miller,2 which only
randomly mutates some bytes of an input file without knowing its format. On the contrary, those techniques utilizing the
input’s specification or other knowledge or adopting runtime information (e.g., path coverage) to guide the test case generation
are usually considered as “smart” fuzzing. In general, “dumb”
and “smart” fuzzing methods provide different cost/precision
tradeoffs and are suitable for different situations. For testers,
which kind of fuzzer to choose mainly depends on two factors:
1) the type of the target program and 2) the test requirements
(time/cost, etc.). In case that the input format of a target program
(e.g., complier, system call, network protocol, etc.) is specific
or strict, it will be more effective to choose a grammar-based
fuzzer (many of this kind of fuzzers are black-box, such as Trinity [32] but developers recently built gray-box fuzzers targeting
this sort of software, like Syzkaller [15]). In other cases, testers
should consider more about the test requirements. If the primary
goal of testing is efficiency rather than precision or high-quality
output, black-box fuzzing will be a good choice. For instance,
if a software system is not tested before and testers want to find
and eliminate “shallow” bugs as quickly as possible, black-box
fuzzing is a good start. On the contrary, if testers focus more
on the quality of output (i.e., the variety, exploitability, and
2 https://fuzzinginfo.files.wordpress.com/2012/05/cmiller-csw-2010.pdf.
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
number of the discovered bugs) and want to achieve a higher
code coverage, gray-box (sometimes white-box) fuzzing is usually more suitable [14]. Compared with gray-box fuzzing, whitebox fuzzing is not very practical in industry (although SAGE is a
famous white-box fuzzer in the market) because this technique
is very expensive (time/resource-consuming) and faces many
challenges (e.g., path explosion, memory modeling, constraint
solving, etc.). However, white-box fuzzing is a popular research
direction with lots of potential.
V. STATE OF THE ART IN FUZZING
According to the general process of fuzzing described in
Section III, the following questions should be considered at the
time of building a fuzzer:
1) how to generate or select seeds and other test cases;
2) how to validate those inputs against the specification of
the target program;
3) how to deal with those crash-inducing test cases;
4) how to leverage runtime information;
5) how to improve the scalability of fuzzing.
In this section, we address RQ1 by summarizing the main
contributions to above five issues of fuzzing in the literature.
We review the papers related to seed generation and selection
in Section V-A, input validation and coverage in Section V-B,
runtime information and effectiveness in Section V-C, crashinducing test cases handling in Section V-D, and scalability in
fuzzing in Section V-E.
A. Seeds Generation and Selection
Given a target program to fuzz, the tester first needs to find the
input interface (e.g., stdin file) so that the target program could
read from a file, then determine the file formats which the target
program can accept, and then choose a subset of collected seed
files to fuzz the target program. The quality of seed files may
highly influence the fuzzing result. Therefore, how to generate
or select suitable seed files in order to discover more bugs is an
important issue.
Some research works have been carried out to address this
problem. Rebert et al. [47] tested six kinds of selection algorithms:
1) the set cover algorithm from Peach;
2) a random seed selection algorithm;
3) a minimal set cover (same as minset, which can be calculated by the greedy algorithm);
4) a minimal set cover weighted by size;
5) a minimal set cover weighted by execution time;
6) a hotset algorithm (which fuzz tests each seed file for t
seconds, ranks them by the number of unique found bugs,
and returns the top several seed files in the list).
By spending 650 CPU days on Amazon Elastic Compute
Cloud for fuzzing ten applications, they drew the following
conclusions.
1) The algorithms employing heuristics perform better than
fully random sampling.
2) The unweighted minset algorithm performs the best
among these six algorithms.
LIANG et al.: FUZZING: STATE OF THE ART
3) A reduced set of seed files performs more efficiently than
the original set in practice.
4) Reduced set of seeds can be applied to different applications parsing the same file type.
Kargén and Shahmehri [48] claimed that by performing mutations on the generating program’s machine code instead of
directly on a well-formed input, the resulting test inputs are
closer to the format expected by the program under test, and
thus yield better code coverage. To test complex software (e.g.,
PDF readers) that usually take various inputs embedded with
multiple objects (e.g., fonts, pictures), Liang et al. [49] leveraged the structure information of the font files to select seed files
among the heterogeneous fonts. Skyfire [50] took as inputs a corpus and a grammar, and used the knowledge in the vast number
of existing samples to generate well-distributed seed inputs for
fuzzing programs that process highly structured inputs. An algorithm was presented in [51] to maximize the number of bugs
found for black-box mutational fuzzing given a program and a
seed input. The main idea behind this is to leverage white-box
symbolic analysis on an execution trace for a given programseed pair to detect dependences among the bit positions of an
input, and then use this dependence relation to compute a probabilistically optimal mutation ratio for this program-seed pair.
Furthermore, considering fuzzing as an instance of the coupon
collector’s problem in probability analysis, Arcuri et al. [52]
proposed and proved nontrivial, optimal lower bounds for the
expected number of test cases sampled by random testing to
cover predefined targets, although how to achieve the bounds
goal in practice is not given.
For random generation of unit tests of object-oriented programs, Pacheco et al. [53] proposed to use feedback obtained
from executing the sequence as it is being constructed, in order
to guide the search toward sequences that yield new and legal
object states. Therefore, inputs that create redundant or illegal
states are never extended. However, Yatoh et al. [54] argued that
feedback guidance may overdirect the generation and limit the
diversity of generated tests, and proposed an algorithm named
feedback-controlled random testing, which controls the amount
of feedback adaptively.
How can we get original seed files in the first place? For
some open-source projects, the applications are published with
a vast of input data for testing, which can be freely obtained
as quality seeds for fuzzing. For example, FFmpeg automated
testing environment (FATE) [55] provides various kinds of test
cases which can be hard to collect on testers’ own strength.
Sometimes, the testing data are not publicly available, but the
developers are willing to exchange it with people who will report
program bugs in return. Some other open-source projects also
provide format converters. Therefore, with a variety of file sets
in certain format, the tester can obtain decent seeds for fuzzing
by using format converter. For instance, cwebp [56] can convert
TIFF/JPEG/PNG to WEBP images. Furthermore, reverse engineering is helpful to provide seed input for fuzzing. Prospex [57]
can extract network protocol specifications including protocol
state machines and use them to automatically generate input for
a stateful fuzzer. Adaptive random testing (ART) [58] modifies
random testing by sampling the space of tests and only execut-
1205
ing those most distant, as determined by a distance metric over
inputs, from all previously executed tests. ART has not always
been shown to be effective for complex real-world programs
[59], and has mostly been applied to numeric input programs.
Compared to the approaches mentioned above, gathering seed
files by crawling the Internet is more general. Based on specific
characters (e.g., file extension, magic bytes, etc.), testers can
download required seed files. It is not a severe problem if the
gathered corpus is huge, since storage is cheap and the corpus
can be compacted to a smaller size while reaching equivalent
code coverage [60]. To reduce the number of fault-inserted files
and maintain the maximum test case coverage, Kim et al. [61]
proposed to analyze fields of binary files by tracking and analyzing stack frames, assembly codes, and registers as the target
software parses the files.
B. Input Validation and Coverage
The ability of automatically generating numerous test cases
to trigger unexpected behaviors of the target program is a significant advantage of fuzzing. However, if the target program
has input validation mechanism, these test cases are quite likely
to be rejected in the early stage of execution. Therefore, how to
overcome this obstacle is a necessary consideration when testers
start to fuzz a program with such a mechanism.
1) Integrity Validation: During the transmission and storage,
errors may be introduced into the original data. In order to detect
these “distorted” data, the checksum mechanism is often used
in some file formats (e.g., PNG) and network protocols (e.g.,
TCP/IP) to verify the integrity of their input data. Using the
checksum algorithm (e.g., hash function), the original data are
attached with a unique checksum value. For the data receiver,
the integrity of the received data can be verified by recalculating checksum value with the same algorithm and comparing it
to the attached one. In order to fuzz this kind of system, additional logic should be added into the fuzzer, so that the correct
checksum values of newly created test cases can be calculated.
Otherwise, the developer must utilize other methods to remove
this obstacle. Wang et al. [36], [62] proposed a novel method to
solve this problem and developed a fuzzer named TaintScope.
TaintScope first uses dynamic taint analysis and predefined rules
to detect potential checksum points and hot input bytes which
can contaminate sensitive application programming interfaces
(APIs) in the target program. Then, it mutates the hot bytes to
create new test cases and changes the checksum points to let all
created test cases pass the integrity validation. Finally, it fixes
the checksum value of those test cases which can make the target program crash by using symbolic execution and constraint
solving. By this means, it can create test cases which can both
pass the integrity validation and cause the program to crash.
Given a set of sample inputs, Höschele and Zeller [63] used
dynamic tainting to trace the data flow of each input character,
and aggregated those input fragments into lexical and syntactical entities. The result is context-free grammar that reflects valid
input structure, which is helpful for the later fuzzing process.
To mitigate coverage-based fuzzers’ limitation to exercise the
paths protected by magic bytes comparisons, Steelix [64] lever-
1206
aged lightweight static analysis and binary instrumentation to
provide not only coverage information but comparison progress
information to a fuzzer as well. Such program state information
informs a fuzzer about where the magic bytes are located in the
test input and how to perform mutations to match the magic
bytes efficiently. There are some other efforts [65], [66], which
have made progress in mitigating this problem.
2) Format Validation: Network protocols, compilers, interpreters, etc., have strict requirements on input formats. Inputs
that do not meet format requirements will be rejected at the
beginning of program execution. Therefore, fuzzing this kind
of target systems needs additional techniques to generate test
cases which can pass the format validation. Most solutions of
this problem are to utilize input specific knowledge or grammar.
Ruiter and Poll [30] evaluated nine commonly used transport
layer security (TLS) protocol implementations by using blackbox fuzzing in combination with state machine learning. They
provided a list of abstract messages (also known as input alphabet) which can be translated by test harness into concrete
messages sent to the system under test. Dewey et al. [67], [68]
proposed a novel method to generate well-typed programs which
use complicated type system by means of constraint logic programing (CLP) and applied their method to generate Rust or
JavaScript programs. Cao et al. [69] first investigated the input
validation situation of Android system services and for Android
devices they built an input validation vulnerability scanner. This
scanner can create semivalid arguments which can pass the preliminary check implemented by the method of target system
service. There are some other works, such as [24], [25], [70],
and [71], that focus on solving this problem.
3) Environment Validation: Only under certain environment (i.e., particular configurations, a certain runtime status/condition, etc.), will many software vulnerabilities reveal
themselves. Typical fuzzing cannot guarantee the syntactic and
semantic validity of the input, or the percentage of the explored
input space. To mitigate these problems, Dai et al. [72] proposed
the configuration fuzzing technique whereby the configuration
of the running application is mutated at certain execution points,
in order to check for vulnerabilities that only arise in certain
conditions. FuzzDroid [73] can also automatically generate an
Android execution environment where an app exposes its malicious behavior. FuzzDroid combines an extensible set of static
and dynamic analyses through a search-based algorithm that
steers the app toward a configurable target location.
4) Input Coverage: Tsankov et al. [74] defined semivalid input coverage (SVCov) that is the first coverage criterion for fuzz
testing. The criterion is applicable whenever the valid inputs can
be defined by a finite set of constraints. By increasing coverage
under SVCov, they discovered a previously unknown vulnerability in a mature Internet Key Exchange (IKE) implementation.
To address shortcomings of existing grammar inference algorithms, which are severely slow and overgeneralized, Bastani
et al. [75] presented an algorithm for synthesizing a contextfree grammar encoding the language of valid program inputs
from a set of input examples and black-box access to the program. Unlike many methods which take the crash of the target
program to determine whether an input is effective, ArtFuzz
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
[76] aims at catching the noncrash buffer overflow vulnerabilities. It leverages type information and dynamically discovers
likely memory layouts to help the fuzzing process. If a buffer
border identified from the memory layout is exceeded, an error
will be reported.
Considering increased diversity leads to improved coverage
and fault detection, Groce et al. [77] proposed a low-cost and
effective approach, called swarm testing, to increase the diversity of (randomly generated) test cases, which uses a diverse
swarm of test configurations, each of which deliberately omits
certain API calls or input features. Furthermore, to mitigate the
inability to focus on part of a system under test, directed swarm
testing [78] leveraged swarm testing and recorded statistical
data on past testing results to generate new random tests that
target any given source code element. Based on the observation
that developers sometimes accompany submitted patches with a
test case that partially exercises the new code, and this test case
could be easily used as a starting point for the symbolic exploration, Marinescu and Cadar [79] provided an automatic way
to generate test suites that achieve high coverage of software
patches.
C. Handling Crash-Inducing Test Cases
Software companies (e.g., Microsoft) and open-source
projects (e.g., Linux) often use fuzzing to improve the quality and reliability of their products [17]. Although fuzzing is
good at generating crash-inducing test cases, it is usually not
smart enough to analyze the importance of these test cases automatically. If fuzzing results in a bunch of raw crash-inducing
test cases, it takes testers a large amount of time to find different
bugs in the target program by analyzing these test cases. Due to
time or budget constraints, developers prefer to fix those severe
bugs.
Currently, only a few efforts focus on how to filter raw fuzzing
outputs that make the fuzzing results more useful to testers.
Given a large collection of test cases, each of which can trigger program bugs, Chen et al. [26] proposed a ranking-based
method by which the test cases inducing different bugs are put on
the top in the list. This ranking-based method is more practical
compared with traditional clustering methods [80], [81]. Therefore, testers can focus on analyzing the top crash-inducing test
cases. Besides filtering crash-inducing test cases directly, there
are also some other methods to help reduce expensive manual work, such as generating unique crash-inducing test cases,
trimming test cases, and providing useful debug information.
The uniqueness of crash-inducing test cases can be quite reliably determined by the call stack of the target thread and the
address of fault-causing instruction [82]. If two distinct test cases
cause the target program to crash with identical call stacks, it
is very likely that these two test cases correspond to the same
bug; therefore, only one of them needs to be reserved for manual analysis. On the contrary, if they cause the target program
to crash at the same location but with different stack traces, it
is quite possible that they correspond to two distinct bugs, and
thus both of them are worth analyzing separately. Compared
with recording the call stack, tracing execution path is a simpler
LIANG et al.: FUZZING: STATE OF THE ART
but less reliable way to determine uniqueness. AFL [27], one
of the most popular fuzzers, treats a crash-inducing test case as
unique if it finds a new path or does not find a common path.
This easy-to-implement approach is similar to the execution
path record method which is the foundation of AFL. Producing
unique crash-inducing test cases can help reduce the redundant fuzzing outputs, thus can save time and effort for manual
analysis.
The size of test cases can greatly influence the efficiency of
manual analysis, because larger test cases cost more time to execute and locate fault-causing instructions. For a mutation-based
fuzzer, if low-quality seed files are used, the size of the generated test cases can be iteratively increased under some kinds
of mutation methods. Therefore, during the process of fuzzing,
periodically trimming the generated test cases can improve the
overall efficiency, thus reducing the workload of manual analysis of crash-inducing test cases. The principle of trimming is
simple: behavior of the processed one should be identical to
the original one; in other words, they should follow the same
execution path. The general steps of trimming are sequentially
removing data blocks from a test case and re-evaluating the
rest of the test case; those data blocks that cannot influence the
execution path will be trimmed.
Evaluating the exploitability of fuzzing outputs usually needs
code analysis and debugging work which can benefit from specialized tools, such as GDB, Valgrind [41], AddressSanitizer
[83], etc. These tools provide runtime context (e.g., the state of
call stack and registers, the address of fault-inducing instruction,
etc.) of the target program or can detect particular kind of program fault such as memory error. With their assistance, testers
are able to discover and evaluate program bugs more efficiently.
Additionally, Pham et al. [84] presented a method for generating
inputs which reach a given “potentially crashing” location. The
test input generated by their method served as a witness of the
crash.
D. Leveraging Runtime Information
As two of popular program analysis techniques, symbolic execution and dynamic taint analysis are often leveraged to make
fuzzing smart because they can provide much runtime information (such as code coverage, taint data flow) and help fuzzing
find “hidden” bugs [85], [86]. However, the problems they have
also hinder fuzzing from pursuing higher efficiency [87]. In this
section, we will discuss the problems, such as path explosion,
imprecise symbolic execution faced by concolic execution and
undertainting, and overtainting faced by dynamic taint analysis, and summarize the corresponding solutions. Knowing them
can help readers have a more thorough understanding of smart
fuzzing.
1) Path Explosion: Path explosion is an inherent and the
toughest problem in symbolic execution because conditional
branches in the target program are usually numerous, even a
small-size application can produce a huge number of execution
paths.
From the aspects of program analysis method and path search
algorithm, several researches have tried to mitigate this problem.
1207
For example, function summaries [88], [89] are used to describe
properties of low-level functions so that high-level functions can
reuse them to reduce the number of execution paths. Redundant
path pruning is used to avoid execution of those paths which
have the same side effects with some previously covered paths.
For instance, Boonstoppel et al. [90] proposed a technique for
detecting and discarding large number of redundant paths by
tracking read and write operations performed by the target program. The main idea behind this technique is that if a path
reaches a program point under the same condition as some previously explored paths, then the path leading to an identical
subsequent effect can thus be pruned. Besides, merging states
[91] obtained on different paths can also reduce the path search
space, but this method aggravates the solver’s burden.
Heuristic search algorithms, on the other hand, can explore
the most relevant execution paths as soon as possible within a
limited time. For example, random path selection [92] and automatic partial loop summarization [93] were proven successful
in practice mainly because they avoid being stuck when meeting
some tight loop which can rapidly create new states. Another
example is the control flow graph (CFG)-directed path selection
[94], which utilizes static control flow graph to guide the test
case generation to explore the closest uncovered branch. Experiment shows that this greedy approach can help improve coverage faster and achieve a higher percentage of code coverage.
Besides, there is also generational search [38] which explores all
subpaths of each expanded execution, scores them, and finally
picks a path with the highest score for next execution. Considering that existing coverage-based gray-box fuzzing tools visit too
many states in high-density regions; Böhme et al. [95] proposed
and implemented several strategies to force AFL [27] to generate fewer inputs for states in a high-density region via visiting
more states that are otherwise hidden in a low-density region. To
mitigate the path explosion problem, DeepFuzz [96] assigned
probabilities to execution paths and applied a new search heuristic that can delay path explosion effectively into deeper layers
of the tested binary. Given a suite of existing test cases, Zhang et
al. [97] leveraged test case reduction and prioritization methods
to improve the efficiency of seeded symbolic execution, with
the goal of gaining incremental coverage as quickly as possible.
2) Imprecise Symbolic Execution: The imprecision of symbolic execution is mainly caused by complicated program structures (e.g., pointers) modeling, library, or system calling and
constraint solving. Some simplification methods are necessary
to be carried out in the aforementioned areas in order to make
symbolic execution practicable. Therefore, the key point for developers is to find a balance between scalability and precision.
Currently, some methods have been proposed to make symbolic execution more practical at the cost of precision. The
operations of pointer are simplified in CUTE [98], which only
considers equality and inequality predicates when dealing with
symbolic pointer variables. Pointers are regarded as arrays in
KLEE [92]. KLEE copies the present state N times, when a
pointer p indirectly refers to N objects, and in each state, it implements proper read or write operation in the premise that p
is not beyond the bounds of the corresponding object. It will
use concrete values instead of symbolic values at the library or
1208
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
free first-order logic formulas as most current approaches do.
For more detailed information about symbolic execution and
taint analysis, refer to professional literatures [39], [105].
E. Scalability in Fuzzing
Fig. 4.
Code snippet that shows implicit data flow.
system call sites where the source code is not accessible [99].
In addition, constraint solving is added with lots of constraint
optimizations [100] (e.g., SAGE uses unrelated constraint elimination, local constraint caching, flip count limit, etc., to improve
the memory usage and speed when generating constraints) or
even input grammar (e.g., Godefroid et al. [36] proposed an
approach by which constraints based on input grammar can be
directly generated by symbolic execution. Then, the satisfiability
of these constraints will be verified by a customized constraint
solver which also leverages input grammar. Therefore, it can
generate highly structured inputs).
To fuzz testing floating-point (FP) code which may also result
in imprecise symbolic execution, Godefroid and Kinder [102]
combined a lightweight local path-insensitive “may” static analysis of FP instructions with a high-precision whole-program
path-sensitive “must” dynamic analysis of non-FP instructions.
Fu and Su [103] turned the challenge of testing FP code into
the opportunity of applying unconstrained programming—the
mathematical solution for calculating function minimum points
over the entire search space. They derived a representing function from the FP code, any of whose minimum points is a test
input guaranteed to exercise a new branch of the tested program.
3) Undertainting: Undertainting happens when implicit data
flow in which data transmission is associated with control flow,
and array operation, etc., are totally neglected. As shown in
Fig. 4, during the conversion from plain text to Rich Text Format, the value of the input variable is transferred into output
array without a direct assignment. Therefore, if input is tainted,
neglecting this implicit data flow will cause undertainting. Kang
et al. [44] have made a progress to solve this problem. According to their experiment, taint propagation for all implicit flows
also leads to unacceptable-large overtainting. Therefore, they
only focused on the taint propagation for complete informationpreserving implicit flows like the example shown in Fig. 4.
4) Overtainting: Overtainting happens when taint propagation is not implemented at finer grained granularity. It causes
taint explosion and false positive. Yadegari et al. [104] proposed
an approach which fulfills taint propagation at bit level to mitigate this problem. Another method of dealing with this issue is to
use underapproximations to check existential “for-some-path”
properties. Godefroid [89] proposed a new approach to test generation where tests are derived from validity proofs of first-order
logic formulas, rather than satisfying assignments of quantifier-
Facing the size and complexity of real-world applications,
modern fuzzers tend to be either scalable but not effective in
exploring bugs located deeper in the target program, or not
scalable yet capable of penetrating deeper in the application.
Arcuri et al. [106] discussed the scalability of fuzzing, which
under certain conditions, can fare better than a large class of
partition testing techniques.
Bounimova et al. [107] reported experiences with constraintbased white-box fuzzing in production across hundreds of large
Windows applications and over 500 machine years of computation from 2007 to 2013. They extended SAGE with logging and
control mechanisms to manage multimonth deployments over
hundreds of distinct application configurations. They claimed
their work as the first production use of white-box fuzzing and
the largest scale deployment of white-box fuzzing to date.
Some approaches leveraged application-aware fuzzing or
fuzzing by cloud services methods for this issue. Rawat et
al. [108] presented an application-aware evolutionary fuzzing
strategy that does not require any prior knowledge of the application or input format. In order to maximize coverage and
explore deeper paths, they leveraged control- and data-flow features based on static and dynamic analysis to infer fundamental
properties of the application, which enabled much faster generation of interesting inputs compared to an application-agnostic
approach.
A method for improving scalability is to reduce the scope
of the analysis. Regression analysis is a well-known example
where the differences between program versions serve as the
basis to reduce the scope of the analysis. DiSE [109] combines
two phases: static analysis and symbolic execution. The set of
affected program instructions is generated in the first phase. The
information generated by the static analysis is then used to direct
symbolic execution to explore only the parts of the programs
affected by the changes, potentially avoiding a large number of
unaffected execution paths.
To test mobile apps in a scalable way, every test input needs
to be running with a variety of contexts, such as: device heterogeneity, wireless network speeds, locations, and unpredictable
sensor inputs. The range of values for each context, e.g., location, can be very large. Liang et al. [110] presented Caiipa, a
cloud service for testing apps over an expanded mobile context
space in a scalable way and implemented on a cluster of VMs
and real devices that can emulate various combinations of contexts for mobile apps. Mobile vulnerability discovery pipeline
(MVDP) [111] is also a distributed black-box fuzzing system
for Android and iOS devices.
VI. TOOLS IN DIFFERENT APPLICATION AREAS
Fuzzing is a practical software testing technique which has
been widely applied in industry. Any software accepting user
inputs can be regarded as a fuzzing target. Currently, there
are various fuzzers targeting different software systems. In this
LIANG et al.: FUZZING: STATE OF THE ART
1209
TABLE V
SUMMARIES OF THE TYPICAL FUZZERS
section, we answer RQ2 by investigating some of the most popular and efficient fuzzers classified by their application areas,
i.e., the type of their target software platform. Tables V and VI
summarize the typical fuzzers we are going to introduce, from
application areas and problem domains, respectively.
A. General Purpose Fuzzers
1) Peach: Peach [112] is a well-known general purpose
fuzzer of which the most common targets are drivers, file consumers, network protocols, embedded devices, systems, etc. It
is constructed with the following components: 1) predefined input format definitions called Peach Pits, which are available as
individual Pits or groups of related Pits called Pit Packs; 2) test
passes, which can weight mutators to perform more test cases;
and 3) minset, which helps pare down the file count for test
case coverage. Peach plays a critical role in this area because
it has many outstanding features, such as threat detection, outof-the-box fuzzing definitions (Peach Pits), and scalable testing options. However, there are also some problems in Peach
(specifically in open-source version). A major one is that it is
time consuming to build Pit file using syntax provided by Peach
to describe the target file format. Built on Peach, eFuzz [113]
tested smart metering devices based on the communication protocol DLMS/COSEM, the standard protocol used in Europe, for
possible faults. Honggfuzz [114] is also built on Peach.
2) beSTORM: beSTORM [115] is a commercial black-box
fuzzer. It can be used to test the security of target application
1210
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
TABLE VI
TYPICAL FUZZERS AND THEIR PROBLEM DOMAINS
The meaning of the symbols in table is as follows:
1) Seeds generation and selection
: The fuzzer can automatically generate seeds or adopt some seed selection algorithm.
: The fuzzer provides some high-quality seeds for testers to choose (usually for mutation-based fuzzers).
×: The seeds are collected by testers manually.
/: The fuzzer does not require seeds in random (usually for grammar-based fuzzers).
2) Input validation and coverage
: The fuzzer utilizes input grammars or other knowledge to generate test cases, or adopts some ways to pass through input validation.
: The fuzzer uses some methods to mitigate the problem caused by input validation.
×: The fuzzer does not use any input information or approach to pass through input validation.
3) Handling crash-inducing test cases
: The fuzzer can analyze the found bugs automatically and generate a detailed bug report.
: The fuzzer can provide some useful information, like log file, to help subsequent bug analysis.
×: The fuzzer only generates crash-inducing test cases.
4) Leveraging runtime information
: The fuzzer uses runtime information to guide test case generation.
×: The fuzzer does not use any feedback information during runtime.
5) Scalability in fuzzing
: The fuzzer can test real-world applications effectively and has found plenty of bugs.
: The fuzzer is in its experiment period and applied to some real-world programs.
×: The fuzzer can only test some experimental programs.
or to check the quality of networked hardware and software
products. It does not need source code but only binaries of the
target program. The fuzzing strategy beSTORM takes is to first
test the likely, common failure-inducing area and then expand
to a near-infinite range of attack variations; therefore, it can
quickly deliver a result. beSTORM can be used in the test of
protocols, applications, hardware, files, WIFI, and embedded
device security assurance (EDSA). For example, it is able to
find bugs in the application which implements the EDSA 402
Standards.
B. Fuzzers for Compilers and Interpreters
1) jsfunfuzz: jsfunfuzz [116] is a grammar-based black-box
fuzzer designed for Mozilla’s SpiderMonkey JavaScript engine.
It is the first publicly available JavaScript fuzzer. Since developed in 2007, it has found over 2000 bugs in SpiderMonkey. It combines differential testing with detailed knowledge
of the target application; therefore, it can efficiently find both
correctness-related bugs and crash-triggering bugs in different
JavaScript engines. However, for each new language feature,
jsfunfuzz has to adjust itself to respond to the new feature in the
fuzzing process.
2) Csmith: Csmith was proposed by Yang et al. [117] in
2011. It is a C compiler fuzzer which can generate random C programs with specified grammar according to the C99 standard. It
utilizes random differential testing [118] to help find correctness
bugs which are caused by potentially undefined behavior and
other C-specific problems. Csmith has been used for years and
has found hundreds of previously unknown bugs in both commercial and open source C compilers (e.g., GNU Compiler Collection (GCC), low level virtual machine (LLVM)). Since it is
an open-source project, the latest information and version about
Csmith can be accessed at [119]. Although Csmith is a practical
fuzzer which is good at generating error-inducing test cases,
as many other fuzzers do, it does not prioritize the bugs found
according to the importance. Therefore, testers must spend a lot
of time in deciding the novelty and the severity of each bug.
3) LangFuzz: Inspired by jsfunfuzz, Holler et al. [120] presented LangFuzz in 2012. LangFuzz does not aim at a particular
LIANG et al.: FUZZING: STATE OF THE ART
language. So far, it has been tested on JavaScript and hypertext
preprocessor (PHP). After applied to JavaScript engine, LangFuzz has found more than 500 previously unknown bugs in
SpiderMonkey [26]. Applied to PHP interpreter, it also discovered 18 new defects that can result in crash. LangFuzz makes
use of both stochastic generation and code mutation to create
test cases, but regards mutation as the primary technique. It is
designed as a language-independent fuzzer, but adapting it to a
new language needs some necessary changes.
4) CLsmith: Lidbury et al. [70] leveraged random differential testing and equivalence modulo inputs (EMI) testing to
fuzzing many-core compiler and identified and reported more
than 50 OpenCL compiler bugs, which is the most in commercial implementations. Specifically, they employed random
differential testing to the many-core setting for generating deterministic, communicating, feature-rich OpenCL kernels, and
proposed and evaluated injection of dead-by-construction code
to enable EMI testing in the context of OpenCL.
Other fuzzers in this category include MongoDB’s JavaScript
Fuzzer which detected almost 200 bugs over the course of two
release cycles [121]; Ifuzzer is another JavaScript interpreter
fuzzer using genetic programming [122].
C. Fuzzers for Application Software
1) SAGE: SAGE [14] is a well-known white-box fuzzer developed by Microsoft. It is used to fuzz large file-reading Windows applications (e.g., document parsers, media players, image
processors, etc.) running on x86 platform. Combining concolic
execution with a heuristic search algorithm to maximize code
coverage, SAGE tries its best to reveal bugs effectively. Since
2008, this tool has been running incessantly on an average of
100-plus machines/cores and fuzzing automatically a few hundred applications of Microsoft. SAGE is the first fuzzer realizing
the white-box fuzzing technique and able to test real-world applications. Nowadays, Microsoft is promoting an online fuzzing
project called Springfield [123]. It provides multiple methods
including Microsoft white-box fuzzing technology to find bugs
in the binary programs uploaded by customers. Future work of
SAGE consists of improving its search method, enhancing the
precision of its symbolic execution and increasing the capability
of its constraint solving for discovering more bugs [38].
2) AFL: AFL [27] is a well-known code-coverage guide
fuzzer. It gathers the information of runtime path coverage by
code instrumentation. For open-source applications, the instrumentation is introduced at compile time and for binaries the
instrumentation is introduced at runtime via a modified QEMU
[124]. The test cases which can explore new execution paths
have more chance to be chosen in the next round of mutation.
The experimental result shows that AFL is efficient at finding
bugs in real-world use cases, such as file compression libraries,
common image parsing, and so on. AFL supports C, C++,
Objective C, or executable programs and works on Linux-like
OS. Besides, there is also some work to extend the application
scenarios of AFL, such as TriforceAFL [125], which is used to
fuzz kernel syscalls, WinAFL [126], which ports AFL to Windows, and the work from ORACLE [127], which uses AFL to
fuzz some filesystems. Although AFL is efficient and easy to
1211
use, there is still room to improve. Like many other brute-force
fuzzers, AFL provides limited code coverage when the actual
input data are compressed, encrypted, or bundled with checksum. Besides, AFL takes more time when dealing with 64-bit
binaries and does not support fuzzing network services directly.
3) QuickFuzz: QuickFuzz [128] leverages Haskell’s
QuickCheck (the well-known property-based random testing
library) and Hackage (the community Haskell software repository) in conjunction with off-the-shelf bit-level mutational
fuzzers to provide automatic fuzzing for more than a dozen
common file formats, without providing external set of input
files nor developing models for the file types involved. QuickFuzz generates invalid inputs using a mix of grammar-based
and mutation-based fuzzing techniques to discover unexpected
behavior in a target application.
To test the server side software, Davis et al. [129] presented
Node.fz, which is a scheduling fuzzer for event-driven programs,
embodied for server-side Node.js programs. Node.fz randomly
perturbs the execution of a Node.js program, allowing Node.js
developers to explore a variety of possible schedules. Work in
this category also includes Dfuzzer [130], which is a fuzzer for
D-bus service.
To test mobile applications, some fuzzers have been presented in recent years, such as Droid-FF [131], memory-leak
fuzzer [132], DroidFuzzer [133], intent fuzzer [134], and Android Ripper MFT tool [135] for Android apps.
D. Fuzzers for Network Protocols
1) Sulley: Sulley [136] is an open-source fuzzing framework
targeting network protocols. It utilizes a block-based approach
to generate individual “requests.” It provides lots of needed data
formats for users to build protocol descriptions. Before testing,
users should use these formats to define all necessary blocks
which will be mutated and merged in the fuzzing process to
create new test cases. Sulley can classify the detected faults,
work in parallel, and trace down to a unique sequence of a
test case triggering a fault. However, it is currently not well
maintained. Boofuzz [137] is a successor to Sulley.
2) TLS-Attacker: Somorovsky [138] presented TLSAttacker, an open-source framework for evaluating the security
of TLS libraries. TLS-Attacker allows security engineers to
create customized TLS message flows and arbitrarily modify
message contents by using a simple interface to test the behavior
of their libraries. It successfully found several vulnerabilities
in widely used TLS libraries, including OpenSSL, Botan, and
MatrixSSL.
There are some other works about fuzzing network protocols
[139], [140]. T-Fuzz [141] is a model-based fuzzer for robustness testing of telecommunication protocols, Secfuzz [142] for
IKE protocol, and both SNOOZE [143] and KiF [144], [145]
for the VOIP/SIP protocol.
E. Fuzzers for OS Kernels
Kernel components in OS are difficult to fuzz as feedback
mechanisms (i.e., guided code coverage) cannot be easily applied. Additionally, nondeterminism is due to interrupts, kernel
threads, and statefulness poses problems [146]. Furthermore, if
1212
a process fuzzes its own kernel, a kernel crash highly impacts
the performance of the fuzzer because of the reboot of the OS.
1) Trinity: In recent years, Trinity [147] has gained a lot of
attention in the area of kernel fuzzing. It implements several
methods to send syscalls semi-intelligent arguments. The methods used to generate arguments of system call are described as
follows: 1) If a system call expects a certain data type as an
argument (e.g., descriptor), it gets passed one; 2) if a system
call only accepts certain values as an argument (e.g., a ‘flags’
field), it has a list of all the valid flags that may be passed; and
3) if a system call only takes a range of values, a random value
passed to an argument usually fits that range. Trinity supports
a variety of architectures including x86-64, SPARC-64, S390x,
S390, PowerPC-64, PowerPC-32, MIPS, IA-64, i386, ARM,
Aarch64, and Alpha.
2) Syzkaller: Syzkaller [15] is another fuzzer targeting
Linux kernels. It depends on predefined templates which specify
the argument domains of each system call. Unlike Trinity, it also
makes use of code coverage information to guide the fuzzing
process. Because Syzkaller combines the coverage-guided and
template-based techniques, it does work better than provided
only with the pattern of argument usages for system calls. This
tool is under active development but the early results look impressive.
3) IOCTL Fuzzer: IOCTL Fuzzer [148] is a tool designed
to automatically search for vulnerabilities in Windows kernel
drivers. Currently, it supports Windows 7 (x32 and x64), 2008
Server, 2003 Server, Vista, and XP. If an IOCTL operation is
conformed to conditions specified in the configuration file, the
fuzzer replaces its input field with randomly generated data.
4) Kernel-AFL (kAFL): Schumilo et al. [149] proposed
coverage-guided kernel fuzzing in an OS-independent and
hardware-assisted way. They utilize a hypervisor to produce
coverage and Intel’s Processor Trace technology to provide
control flow information on running code. They developed a
framework called kAFL to assess the reliability or security
of Linux, MacOS, and Windows kernel components. Among
many crashes, they uncovered several flaws in the ext4 driver
for Linux, the HFS and APFS file system of MacOS, and the
NTFS driver of Windows.
5) CAB-FUZZ: To discover the vulnerabilities of commercial off-the-shelf (COTS) operating systems (OSes), Kim et al.
[150] proposed CAB-FUZZ, a practical concolic testing tool to
explore relevant paths that are most likely triggering bugs. This
fuzzer prioritized the boundary states of arrays and loops and exploited real programs interacting with COTS OSes to construct
proper contexts to explore deep and complex kernel states without debug information. It found 21 undisclosed unique crashes
in Windows 7 and Windows Server 2008, including three critical vulnerabilities. Five of those found vulnerabilities have been
existing for 14 years and could be triggered even in the initial
version of Windows XP.
F. Fuzzers for Embedded Devices, Drivers and Components
1) YMIR: Kim et al. [29] proposed the automatic generation
of fuzzing grammars using API-level concolic testing, and implemented a tool (named YMIR) to automate white-box fuzz
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
testing on ActiveX controls. It takes an ActiveX control as input
and delivers fuzzing grammars as its output. API-level concolic
testing collects constraints at the library function level rather
than the instruction level, and thus may be faster and less accurate.
2) vUSBf: vUSBf [151] was first proposed at Black Hat
Europe 2014. It is a fuzzing framework for USB drivers. This
framework implements a virtual USB fuzzer based on Kernel
Virtual Machine (in Linux) and the USB redirection protocol in
QEMU. It allows the dynamic definition of several million test
cases using a simple XML configuration. Each test is marked
using a unique identification and thus is reproducible. It can trigger the following bugs in Linux kernels and device drivers: nullpointer dereferences, kernel paging requests, kernel panic, bad
page state, and segmentation fault. There are some other works
in this area, such as a cost-effective USB testing framework
[152], VDF–a targeted evolutionary fuzzer of virtual devices
[153].
Besides the aforementioned fuzzers, there are also
many other practical tools, including perf_fuzzer [154] for
perf_event_open() system call, libFuzzer [155] for library, Modbus/TCP Fuzzer for internetworked industrial systems [156],
a fuzzer for I/O buses [157], a fuzzer for digital certificates
[158], Gaslight [159] for memory forensics frameworks, etc.
Moreover, along with memory error detectors (e.g., Clang’s AddressSanitizer [83], MemorySanitizer [160], etc.), fuzzers can
be reinforced to expose more hidden bugs rather than shallow
bugs.
VII. FUTURE DIRECTIONS
In this section, we answer RQ3 by discussing some of the
possible future directions of the fuzzing technique. Although
we cannot accurately predict the future directions that the study
of fuzzing will follow, it is possible for us to identify and summarize some trends based on the reviewed papers, which may
suggest and guide directions of future research. We will discuss
the future work in the following directions, with the hope that
the discussion could inspire follow-up researches and practices.
A. Input Validation and Coverage
Overly complex, sloppily specified, or incorrectly implemented input languages, which describe the set of valid inputs
an application has to handle, are the root causes of many security
vulnerabilities [161]. Some systems are strict on input formats
(e.g., network protocols, compilers and interpreters, etc.); inputs that do not satisfy the format requirement will be rejected
in the early stage of execution. In order to fuzz this kind of
target program, the fuzzer should generate test cases which can
pass the input validation. Many researches targeted this problem and made an impressive progress, such as [162] for string
bugs, [163], [164] for integer bugs, [165] for e-mail filters, and
[166] for buffer bugs. Open issues in this area include dealing with FP operations (e.g., Csmith, well-known C compiler
fuzzer, does not generate FP programs), applying existing techniques to other languages (e.g., applying CLP on C language),
etc. Furthermore, Rawat et al. [108] demonstrated that inferring
input properties by analyzing application behavior is a viable
LIANG et al.: FUZZING: STATE OF THE ART
and scalable strategy to improve fuzzing performance, as well
as a promising direction for future research in the area. As we
mentioned in Section V-A, although TaintScope can locate the
checksum points accurately and increase the effectiveness of
fuzzing dramatically, there is still room for improvement. First,
it cannot deal with digital signature and other secure check
schemes. Second, its effectiveness will be highly influenced by
encrypted input data. Third, it ignores control flow dependences
and does not instrument all kinds of ×86 instructions. These are
still open problems.
B. Smart Fuzzing
Many other program analysis techniques are merged into
smart fuzzing [167], [168], such as concolic execution, dynamic
taint analysis, and so on. Although these techniques bring many
benefits, they also cause some problems: such as path explosion, imprecise symbolic execution in concolic test and undertainting, overtainting in dynamic taint analysis. As an example,
Dolan-Gavitt et al. [169] have injected thousands of bugs into
eight real-world programs, including bash, tshark, and the GNU
Coreutils. They evaluated and found that a prominent fuzzer and
a symbolic execution-based bug finder were able to locate some
but not all injected bugs. Furthermore, fuzzing in a scalable
and efficient way is still challenging. Bounimova et al. [107]
presented key challenges with running white-box fuzzing on a
large scale, which involve those challenges in symbolic execution, constraint generation and solving, long-running state-space
searches, diversity, fault tolerance, and always-on usage. These
problems all deserve to be studied in more depth.
C. Filtering Fuzzing Outputs
During a software development life cycle, time and budget
for fixing bugs are usually constrained. Therefore, the main
concern of the developer is to solve those severe bugs under
these constraints. For example, Podgurski et al. [80] proposed
automated support for classifying reported software failures to
facilitate prioritizing them and diagnosing their causes. Zhang
et al. [170] proposed to select test cases based on test case similarity metric to explore deep program semantics. Differential
testing may be helpful to determine the cost of evaluating test
results [171], [172]. In short, at present, there has been little
research on filtering more important failure-inducing test cases
from the large fuzzing outputs. This research direction is of
practical importance.
D. Seed/Input Generation and Selection
The result of fuzzing is correlated with the quality of
seed/input files. Therefore, how to select suitable seed files
in order to find more bugs is an important issue. By attempting to maximize the testing coverage of the input domain, the
methods or algorithms of dealing with test cases in ART [58],
[173], [174]–[176] may be useful. For example, Pacheco et al.
[53] presented a feedback-directed random test generation technique, where a predesigned input was executed and checked
against a set of contracts and filters. The result of the execution determines whether the input is redundant, illegal, contract
1213
violating, or useful for generating more inputs. However, in the
massive experiments, Arcuri and Briand [59] showed that ART
was highly inefficient even on trivial problems when accounting for distance calculations among test cases. Classfuzz [177]
mutated seeding class-files using a set of predefined mutation
operators, employed Markov Chain Monte Carlo sampling to
guide mutator selection, and used coverage uniqueness as a discipline for accepting representative ones. Shastry et al. [178]
proposed to automatically construct an input dictionary by statically analyzing program control and data flow, and the input
dictionary is supplied to an off-the-shelf fuzzer to influence input generation. To design and implement more effective, sound,
and accurate seed generation and selection algorithms is an open
research problem.
E. Combining Different Testing Methods
As discussed in Section IV, black-box and white-box/graybox fuzzing methods have their own advantages and disadvantages. Therefore, how to combine these techniques to build a
fuzzer which is both effective and efficient is an interesting research direction. There are a few attempts [124], [179], [48]
in this area, for example, SYMFUZZ [51] augmented blackbox mutation-based fuzzing with a white-box technique, which
helps to calculate an optimal mutation ratio based on the given
program-seed pairs. From microperspective, SYMFUZZ has
two main steps to generate test cases, and each step uses a different fuzzing techniques which are white-box fuzzing and blackbox fuzzing. However, from a macroperspective, this method
can also be regarded as gray-box fuzzing. Because the mutation
ratio of its black-box fuzzing process is computed during its
white-box fuzzing process, so the whole fuzzing utilizes partial knowledge of the target program and can be regarded as
gray-box fuzzing. It is also an interesting direction to combine
fuzzing with other testing techniques. Chen et al. [180] reported
how metamorphic testing [181]–[183], which is a relatively new
testing method which looks at the relationships among the inputs
and outputs of multiple program executions, detected previously
unknown bugs in real-world critical applications, showing that
using diverse perspectives and combining multiple methods can
help software testing achieve higher reliability or security. Garn
and Simos [184] showed the applicability of a comprehensive
method utilizing combinatorial testing and fuzzing to system
call interfaces of Linux kernel.
F. Combining Other Techniques With Fuzzing
Fuzzers are limited by the extent of test coverage and the
availability of suitable test cases. Since static analysis can perform a broader search for vulnerable code patterns, starting
from a handful of fuzzer-discovered program failures, Shastry et al. [185] implemented a simple yet effective matchranking algorithm which used test coverage data to focus attention on matches which comprised untested code and demonstrated that static analysis could effectively complement fuzzing.
Static analysis techniques, such as symbolic execution and control/data flow analysis, can provide useful structural information
for fuzzing [186]; however, there are some limitations of symbolic execution for fuzzing thus leaving some open problems:
1214
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
1) Only generic properties are checked—many deviations from
a specified behavior are not found; and 2) many programs are
not entirely amenable to symbolic execution because they give
rise to hard constraints so that some parts of a program remain
uncovered [187]. Havrikov [188] proposed a combination approach by which fuzzing can benefit from different lightweight
analyses. The analyses leveraged multiple information sources
apart from the target program, such as input and execution (e.g.,
descriptions of the targeted input format in the form of extended
context-free grammars) or hardware counters.
Machine learning techniques are helpful for automatically
generating input grammars for grammar-based fuzzing [189].
Optimization theory can also be used to build effective search
strategies in fuzzing [83], [190], [191]. Genetic programming
or algorithm is used in [122], [192], and [193] to guide their
fuzzers. Dai et al. [194] proposed a novel UI fuzzing technique
aiming at running apps such that different execution paths can
be exercised, and this method required the tester to build a
comprehensive network profile. We believe that there is still
some room for improving the existing combination methods
and leveraging other techniques with fuzzing.
VIII. CONCLUSION
Fuzzing is an automatic and effective software testing technique which is able to discover both correctness and security
bugs. It can be classified into black-box, white-box, and graybox categories according to how much information it acquires
from the target program. During the fuzzing process, the basic
way to find bugs is to generate numerous test cases which are
hopeful to trigger bug-inducing code fragments in the target program. However, there is no fixed pattern in fuzzing to generate
test cases, and thus it mostly depends on the developers’ creativity. We presented a survey on fuzzing covering 171 papers
published between January 1990 and June 2017. The results
of the survey show that fuzzing is a thriving topic with an increasing trend of contributions on the subject. Currently, merged
techniques with fuzzing include genetic algorithm, taint analysis, symbolic execution, coverage-guided methods, etc. Existing fuzzing tools have been widely applied in many kinds of
industry products including compilers, network protocol, applications, kernels, etc., ranging from binaries to source codes
and found tens of thousands of software bugs, many of which
are exploitable. Last but not least, we discuss some open problems with regard to fuzzing. We encourage further research and
practices to address these problems toward a wide adaption of
fuzzing in continuous integration of a software system.
ACKNOWLEDGMENT
The authors thank the anonymous reviewers for their valuable
comments that have helped improve this paper.
REFERENCES
[1] P. Oehlert, “Violating assumptions with fuzzing,” IEEE Security Privacy,
vol. 3, no. 2, pp. 58–62, Mar. 2005.
[2] B. P. Miller, L. Fredriksen, and B. So, “An empirical study of the reliability of UNIX utilities,” Commun. ACM, vol. 33, pp. 32–44, 1990.
[3] B. P. Miller et al., “Fuzz revisited: A re-examination of the reliability
of UNIX utilities and services,” Dept. Comput. Sci., Univ. WisconsinMadison, Madison, WI, USA, Tech. Rep. #1268, 1995.
[4] J. E. Forrester and B. P. Miller, “An empirical study of the robustness of windows NT applications using random testing,” in Proc. 4th
Conf. USENIX Windows Syst. Symp., Seattle, WA, USA, vol. 4, 2000,
pp. 1–10.
[5] B. P. Miller, G. Cooksey, and F. Moore, “An empirical study of the
robustness of MacOS applications using random testing,” in Proc. Int.
Workshop Random Test., 2006, pp. 46–54.
[6] R. Hamlet, “Random testing,” in Encyclopedia of Software Engineering.
New York, NY, USA: Wiley, 1994, pp. 970–978.
[7] G. McGraw, “Silver bullet talks with Bart Miller,” IEEE Security Privacy,
vol. 12, no. 5, pp. 6–8, Sep. 2014.
[8] J. Viide et al., “Experiences with model inference assisted fuzzing,” in
Proc. Conf. USENIX Workshop Offensive Technol., 2008, Art. no. 2.
[9] H. Yang, Y. Zhang, Y. Hu, and Q. Liu, “IKE vulnerability discovery
based on fuzzing,” Security Commun. Netw., vol. 6, no. 7, pp. 889–901,
2013.
[10] J. Yan, Y. Zhang, and D. Yang, “Structurized grammar-based fuzz testing
for programs with highly structured inputs,” Security Commun. Netw.,
vol. 6, no. 11, pp. 1319–1330, 2013.
[11] N. Palsetia, G. Deepa, F. A. Khan, P. S. Thilagam, and A. R. Pais,
“Securing native XML database-driven web applications from XQuery
injection vulnerabilities,” J. Syst. Softw., vol. 122, pp. 93–109, 2016.
[12] M. de Jonge and E. Visser, “Automated evaluation of syntax error recovery,” in Proc. 27th IEEE/ACM Int. Conf. Autom. Softw. Eng., 2012,
pp. 322–325.
[13] J. D. DeMott, R. J. Enbody, and W. F. Punch, “Systematic bug finding and
fault localization enhanced with input data tracking,” Comput. Security,
vol. 32, pp. 130–157, 2013.
[14] P. Godefroid, M. Y. Levin, and D. Molnar, “SAGE: Whitebox fuzzing
for security testing,” Queue, vol. 10, no. 1, pp. 20:20–20:27, 2012.
[15] D. Vyukov, Syzkaller—Linux Kernel Fuzzer. [Online]. Available:
https://github.com/google/syzkaller. Accessed on: Jul. 12, 2016.
[16] D. Babic, “SunDew: Systematic automated security testing,” in Proc.
24th ACM SIGSOFT Int. SPIN Symp. Model Checking Softw., Santa
Barbara, CA, USA, 2017, p. 10.
[17] J. DeMott, “The evolving art of fuzzing,” in Proc. DEF CON Conf.,
vol. 14, 2006, pp. 1–25.
[18] R. McNally, K. Yiu, D. Grove, and D. Gerhardy, “Fuzzing: The state of
the art,” DTIC Document, 2012.
[19] T. L. Munea, H. Lim, and T. Shon, “Network protocol fuzz testing for information systems and applications: A survey and taxonomy,” Multimed.
Tools Appl., vol. 75, no. 22, pp. 14745–14757, Nov. 2016.
[20] B. Kitchenham, Procedures for Performing Systematic Reviews, Keele
Univ., NICTA, Keele, UK, 2004.
[21] J. Webster and R. T. Watson, “Analyzing the past to prepare for the
future: Writing a literature review,” MIS Quart., vol. 26, pp. 1–12, 2002.
[22] M. Woo, S. K. Cha, S. Gottlieb, and D. Brumley, “Scheduling blackbox mutational fuzzing,” in Proc. 2013 ACM SIGSAC Conf. Comput.
Commun. Security, New York, NY, USA, 2013, pp. 511–522.
[23] S. Bekrar, C. Bekrar, R. Groz, and L. Mounier, “A taint based approach
for smart fuzzing,” in Proc. IEEE 5th Int. Conf. Softw. Test. Verification
Validation 2012, 2012, pp. 818–825.
[24] G. Wen, Y. Zhang, Q. Liu, and D. Yang, “Fuzzing the ActionScript virtual machine,” in Proc. 8th ACM SIGSAC Symp. Inf. Comput. Commun.
Security, New York, NY, USA, 2013, pp. 457–468.
[25] R. Brummayer and A. Biere, “Fuzzing and delta-debugging SMT
solvers,” in Proc. 7th Int. Workshop Satisfiability Modulo Theories, 2009,
pp. 1–5.
[26] Y. Chen et al., “Taming compiler fuzzers,” in Proc. 34th ACM SIGPLAN
Conf. Program. Lang. Design Implementation, New York, NY, USA,
2013, pp. 197–208.
[27] American Fuzzy Lop, AFL. [Online]. Available: http://lcamtuf.
coredump.cx/afl/. Accessed on: Jul. 12, 2016.
[28] E. Jääskelä, “Genetic algorithm in code coverage guided fuzz testing,”
Dept. Comput. Sci. Eng., Univ. Oulu, 2016.
[29] S. Y. Kim, S. D. Cha, and D.-H. Bae, “Automatic and lightweight grammar generation for fuzz testing,” Comput. Security, vol. 36, pp. 1–11,
2013.
[30] J. de Ruiter and E. Poll, “Protocol state fuzzing of TLS implementations,”
in Proc. 24th USENIX Security Symp., 2015, pp. 193–206.
[31] B. P. Miller, L. Fredriksen, and B. So, “An empirical study of the reliability of UNIX utilities,” Commun. ACM, vol. 33, pp. 32–44, 1990.
LIANG et al.: FUZZING: STATE OF THE ART
[32] D. Jones, “Trinity: A linux system call fuzz tester.” [Online]. Available:
http://codemonkey.org.uk/projects/trinity/. Accessed on: Jul. 12, 2016.
[33] E. Bazzoli, C. Criscione, F. Maggi, and S. Zanero, “XSS PEEKER:
Dissecting the XSS exploitation techniques and fuzzing mechanisms
of blackbox web application scanners,” in Proc. 31st IFIP Int. Conf.
Inf. Security Privacy, Ghent, Belgium, May 30–Jun. 1, 2016, vol. 471,
pp. 243–258.
[34] F. Duchène, S. Rawat, J. L. Richier, and R. Groz, “LigRE: Reverseengineering of control and data flow models for black-box XSS detection,” in Proc. 20th Working Conf. Reverse Eng., 2013, pp. 252–261.
[35] W. Drewry and T. Ormandy, “Flayer: Exposing application internals,”
in Proc. 1st USENIX Workshop Offensive Technol., Boston, MA, USA,
Aug. 6, 2007, pp. 1–9.
[36] T. Wang, T. Wei, G. Gu, and W. Zou, “TaintScope: A checksum-aware
directed fuzzing tool for automatic software vulnerability detection,” in
Proc. IEEE Symp. Security Privacy, 2010, pp. 497–512.
[37] V. Ganesh, T. Leek, and M. Rinard, “Taint-based directed whitebox
fuzzing,” in Proc. IEEE 31st Int. Conf. Softw. Eng., 2009, pp. 474–484.
[38] P. Godefroid, M. Y. Levin, and D. A. Molnar, “Automated whitebox fuzz
testing,” in Proc. Netw. Distrib. Syst. Security Symp., San Diego, CA,
USA, Feb. 10–13, 2008, pp. 1–16.
[39] C. Cadar and K. Sen, “Symbolic execution for software testing: Three
decades later,” Commun. ACM, vol. 56, no. 2, pp. 82–90, 2013.
[40] C.-K. Luk et al., “Pin: Building customized program analysis tools
with dynamic instrumentation,” in Proc. ACM SIGPLAN Conf. Program.
Lang. Des. Implementation, New York, NY, USA, 2005, pp. 190–200.
[41] N. Nethercote and J. Seward, “Valgrind: A framework for heavyweight
dynamic binary instrumentation,” in Proc. 28th ACM SIGPLAN Conf.
Program. Lang. Des. Implementation, New York, NY, USA, 2007,
pp. 89–100.
[42] R. L. J. Seagle, “A framework for file format fuzzing with genetic algorithms,” Ph.D. dissertation, Univ. Tennessee, Knoxville, TN, USA,
2012.
[43] Y.-H. Choi, M.-W. Park, J.-H. Eom, and T.-M. Chung, “Dynamic binary
analyzer for scanning vulnerabilities with taint analysis,” Multimed. Tools
Appl., vol. 74, no. 7, pp. 2301–2320, 2015.
[44] M. G. Kang, S. McCamant, P. Poosankam, and D. Song, “DTA++:
Dynamic taint analysis with targeted control-flow propagation,” in Proc.
Netw. Distrib. Syst. Security Symp., San Diego, CA, USA, Feb. 6–9,
2011, pp. 1–14.
[45] S. Bekrar, C. Bekrar, R. Groz, and L. Mounier, “Finding software vulnerabilities by smart fuzzing,” in Proc. 4th IEEE Int. Conf. Softw. Test.
Verification Validation, 2011, pp. 427–430.
[46] S. K. Fayaz, T. Yu, Y. Tobioka, S. Chaki, and V. Sekar, “BUZZ: Testing
context-dependent policies in stateful networks,” in Proc. USENIX Symp.
Netw. Syst. Des. Implementation, 2016, pp. 275–289.
[47] A. Rebert et al., “Optimizing seed selection for fuzzing,” in Proc. 23rd
USENIX Security Symp., San Diego, CA, USA, 2014, pp. 861–875.
[48] U. Kargén and N. Shahmehri, “Turning programs against each other:
High coverage fuzz-testing using binary-code mutation and dynamic
slicing,” in Proc. 10th Joint Meeting Found. Softw. Eng., New York, NY,
USA, 2015, pp. 782–792.
[49] H. Liang, Y. Wang, H. Cao, and J. Wang, “Fuzzing the font parser
of compound documents,” in Proc. 4th IEEE Int. Conf. Cyber Security Cloud Comput., New York, NY, USA, Jun. 26–28, 2017, pp. 237–
242.
[50] J. Wang, B. Chen, L. Wei, and Y. Liu, “Skyfire: Data-driven seed generation for fuzzing,” in Proc. IEEE Symp. Security Privacy, 2017, pp. 579–
594.
[51] S. K. Cha, M. Woo, and D. Brumley, “Program-adaptive mutational fuzzing,” in Proc. IEEE Symp. Security Privacy, 2015,
pp. 725–741.
[52] A. Arcuri, M. Z. Z. Iqbal, and L. C. Briand, “Formal analysis of the
effectiveness and predictability of random testing,” in Proc. 19th Int.
Symp. Softw. Test. Anal., Trento, Italy, Jul. 12–16, 2010, pp. 219–230.
[53] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-directed
random test generation,” in Proc. Int. Conf. Softw. Eng., 2007, pp. 75–84.
[54] K. Yatoh, K. Sakamoto, F. Ishikawa, and S. Honiden, “Feedbackcontrolled random test generation,” in Proc. Int. Symp. Softw. Test. Anal.
2015, Baltimore, MD, USA, Jul. 12–17, 2015, pp. 316–326.
[55] FFmpeg. [Online]. Available: http://samples.ffmpeg.org/. Accessed on:
Dec. 15, 2016.
[56] cwebp | WebP, cwebp, Google Developers. [Online]. Available:
https://developers.google.com/speed/webp/docs/cwebp. Accessed on:
Dec. 15, 2016.
1215
[57] P. M. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda, “Prospex:
Protocol specification extraction,” in Proc. 30th IEEE Symp. Security
Privacy, 2009, pp. 110–125.
[58] T. Y. Chen, H. Leung, and I. K. Mak, “Adaptive random testing,” in Proc.
Annu. Asian Comput. Sci. Conf., vol. 3321, 2004, pp. 320–329.
[59] A. Arcuri and L. C. Briand, “Adaptive random testing: An illusion of
effectiveness?” in Proc. 20th Int. Symp. Softw. Test. Anal., Toronto, ON,
Canada, Jul. 17–21, 2011, pp. 265–275.
[60] M. Jurczyk, “Effective file format fuzzing-thoughts techniques and results,” in Proc. Black Hat Eur. Conf., London, U.K., 2016, pp. 1–133.
[61] H. C. Kim, Y. H. Choi, and D. H. Lee, “Efficient file fuzz testing using
automated analysis of binary file format,” J. Syst. Archit., vol. 57, no. 3,
pp. 259–268, 2011.
[62] T. Wang, T. Wei, G. Gu, and W. Zou, “Checksum-aware fuzzing combined with dynamic taint analysis and symbolic execution,” ACM Trans.
Inf. Syst. Security, vol. 14, no. 2, pp. 15:1–15:28, 2011.
[63] M. Höschele and A. Zeller, “Mining input grammars from dynamic
taints,” in Proc. 31st IEEE/ACM Int. Conf. Autom. Softw. Eng., 2016,
pp. 720–725.
[64] Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu, “Steelix:
Program-state based binary fuzzing,” in Proc. 11th Joint Meeting Found.
Softw. Eng., New York, NY, USA, 2017, pp. 627–637.
[65] X. Y. Zhu and Z. Y. Wu, “A new fuzzing technique using niche genetic
algorithm,” Adv. Mater. Res., vol. 756, pp. 4050–4058, 2013.
[66] X. Zhu, Z. Wu, and J. W. Atwood, “A new fuzzing method using multi
data samples combination,” J. Comput., vol. 6, no. 5, pp. 881–888,
May 2011.
[67] K. Dewey, J. Roesch, and B. Hardekopf, “Fuzzing the rust typechecker
using CLP (T),” in Proc. 30th IEEE/ACM Int. Conf. Autom. Softw. Eng.,
Lincoln, NE, USA, Nov. 9–13, 2015, pp. 482–493.
[68] K. Dewey, J. Roesch, and B. Hardekopf, “Language fuzzing using constraint logic programming,” in Proc. 29th ACM/IEEE Int. Conf. Autom.
Softw. Eng., New York, NY, USA, 2014, pp. 725–730.
[69] C. Cao, N. Gao, P. Liu, and J. Xiang, “Towards analyzing the input
validation vulnerabilities associated with android system services,” in
Proc. 31st Annu. Comput. Security Appl. Conf., New York, NY, USA,
2015, pp. 361–370.
[70] C. Lidbury, A. Lascu, N. Chong, and A. F. Donaldson, “Many-core
compiler fuzzing,” in Proc. 36th ACM SIGPLAN Conf. Program. Lang.
Des. Implementation, New York, NY, USA, 2015, pp. 65–76.
[71] J. Zhao, Y. Wen, and G. Zhao, “H-Fuzzing: A new heuristic method for
fuzzing data generation,” in Network and Parallel Computing, E. Altman
and W. Shi, Eds. Berlin, Germany: Springer, 2011, pp. 32–43.
[72] H. Dai, C. Murphy, and G. E. Kaiser, “CONFU: Configuration fuzzing
testing framework for software vulnerability detection,” Int. J. Secure
Softw. Eng., vol. 1, no. 3, pp. 41–55, 2010.
[73] S. Rasthofer, S. Arzt, S. Triller, and M. Pradel, “Making malory behave
maliciously: Targeted fuzzing of android execution environments,” in
Proc. 39th Int. Conf. Softw. Eng., Piscataway, NJ, USA, 2017, pp. 300–
311.
[74] P. Tsankov, M. T. Dashti, and D. A. Basin, “Semi-valid input coverage
for fuzz testing,” in Proc. Int. Symp. Softw. Test. Anal., 2013, pp. 56–
66.
[75] O. Bastani, R. Sharma, A. Aiken, and P. Liang, “Synthesizing program
input grammars,” in Proc. 38th ACM SIGPLAN Conf. Program. Lang.
Des. Implementation, New York, NY, USA, 2017, pp. 95–110.
[76] K. Chen, Y. Zhang, and P. Liu, “Dynamically discovering likely memory
layout to perform accurate fuzzing,” IEEE Trans. Rel., vol. 65, no. 3,
pp. 1180–1194, Sep. 2016.
[77] A. Groce, C. Zhang, E. Eide, Y. Chen, and J. Regehr, “Swarm testing,” in
Proc. Int. Symp. Softw. Test. Anal., Minneapolis, MN, USA, Jul. 15–20,
2012, pp. 78–88.
[78] M. A. Alipour, A. Groce, R. Gopinath, and A. Christi, “Generating focused random tests using directed swarm testing,” in Proc. 25th Int.
Symp. Softw. Test. Anal., Saarbrücken, Germany, Jul. 18–20, 2016,
pp. 70–81.
[79] P. D. Marinescu and C. Cadar, “High-coverage symbolic patch testing,” in Proc. 19th Int. Workshop Model Checking Softw., Oxford, U.K.,
vol. 7385, Jul. 23–24, 2012, pp. 7–21.
[80] A. Podgurski et al., “Automated support for classifying software failure
reports,” in Proc. 25th Int. Conf. Softw. Eng., Washington, DC, USA,
2003, pp. 465–475.
[81] P. Francis, D. Leon, M. Minch, and A. Podgurski, “Tree-based methods
for classifying software failures,” in Proc. 15th Int. Symp. Softw. Rel.
Eng., Saint-Malo, Bretagne, France, Nov. 2–5, 2004, pp. 451–462.
1216
[82] Research Insights Volume 9—Modern Security Vulnerability Discovery,
NCC Group, 2016. [Online]. Available: https://www.nccgroup.trust/
uk/our-research/research-insights-vol-9-modern-security-vulnerabilitydiscovery/. Accessed on: Nov. 15, 2016.
[83] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “AddressSanitizer: A fast address sanity checker,” in Proc. USENIX Annu. Tech.
Conf., Boston, MA, USA, Jun. 13–15, 2012, pp. 309–318.
[84] V. T. Pham, W. B. Ng, K. Rubinov, and A. Roychoudhury, “Hercules:
Reproducing crashes in real-world application binaries,” in Proc. 37th
IEEE Int. Conf. Softw. Eng., vol. 1, 2015, pp. 891–901.
[85] A. Lanzi, L. Martignoni, M. Monga, and R. Paleari, “A smart fuzzer for
×86 executables,” in Proc. 3rd Int. Workshop Softw. Eng. Secure Syst.,
2007, pp. 1–8.
[86] I. Haller, A. Slowinska, M. Neugschwandtner, and H. Bos, “Dowsing for
overflows: A guided fuzzer to find buffer boundary violations,” in Proc.
22th USENIX Security Symp., Washington, DC, USA, Aug. 14–16, 2013,
pp. 49–64.
[87] Y. Shoshitaishvili et al., “SOK: (State of) The art of war: Offensive
techniques in binary analysis,” in Proc. IEEE Symp. Security Privacy,
2016, pp. 138–157.
[88] P. Godefroid, “Compositional dynamic test generation,” in Proc. 34th
Annu. ACM SIGPLAN-SIGACT Symp. Principles Program. Lang., New
York, NY, USA, 2007, pp. 47–54.
[89] P. Godefroid, “Higher-order test generation,” in Proc. 32nd ACM SIGPLAN Conf. Program. Lang. Des. Implementation, New York, NY, USA,
2011, pp. 258–269.
[90] P. Boonstoppel, C. Cadar, and D. R. Engler, “RWset: Attacking path explosion in constraint-based test generation,” in Proc. 14th Int. Conf. Tools
Algorithms Construction Anal. Syst., Budapest, Hungary, vol. 4963, Mar.
29–Apr. 6, 2008, pp. 351–366.
[91] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea, “Efficient state merging in symbolic execution,” in Proc. 33rd ACM SIGPLAN Conf. Program. Lang. Des. Implementation, New York, NY, USA, 2012, pp. 193–
204.
[92] C. Cadar, D. Dunbar, and D. R. Engler, “KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs,”
in Proc. 8th USENIX Symp. Operating Syst. Des. Implementation, San
Diego, CA, USA, Dec. 8–10, 2008, pp. 209–224.
[93] P. Godefroid and D. Luchaup, “Automatic partial loop summarization in
dynamic test generation,” in Proc. Int. Symp. Softw. Test. Anal., Toronto,
ON, Canada, 2011, pp. 23–33.
[94] J. Burnim and K. Sen, “Heuristics for scalable dynamic test generation,”
in Proc. 23rd IEEE/ACM Int. Conf. Autom. Softw. Eng., L’Aquila, Italy,
Sep. 15–19, 2008, pp. 443–446.
[95] M. Böhme, V.-T. Pham, and A. Roychoudhury, “Coverage-based greybox fuzzing as Markov chain,” in Proc. ACM SIGSAC Conf. Comput.
Commun. Security, New York, NY, USA, 2016, pp. 1032–1043.
[96] K. Böttinger and C. Eckert, “DeepFuzz: Triggering vulnerabilities deeply
hidden in binaries,” in Proc. 13th Int. Conf. Detection Intrusions Malware
Vulnerability Assessment, San Sebastián, Spain, Jul. 7–8, 2016, pp. 25–
34.
[97] C. Zhang, A. Groce, and M. A. Alipour, “Using test case reduction and
prioritization to improve symbolic execution,” in Proc. Int. Symp. Softw.
Test. Anal., San Jose, CA, USA, Jul. 21–26, 2014, pp. 160–170.
[98] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing
engine for C,” in Proc. 10th Eur. Softw. Eng. Conf. 13th ACM SIGSOFT
Int. Symp. Found. Softw. Eng., New York, NY, USA, 2005, pp. 263–272.
[99] V. Chipounov, V. Kuznetsov, and G. Candea, “S2E: A platform for invivo multi-path analysis of software systems,” in Proc. 16th Int. Conf.
Archit. Support Program. Lang. Oper. Syst., Newport Beach, CA, USA,
Mar. 5–11, 2011, pp. 265–278.
[100] M. Mouzarani, B. Sadeghiyan, and M. Zolfaghari, “A smart fuzzing
method for detecting heap-based vulnerabilities in executable codes,”
Security Commun. Netw., vol. 9, no. 18, pp. 5098–5115, 2016.
[101] P. Godefroid, A. Kiezun, and M. Y. Levin, “Grammar-based whitebox
fuzzing,” in Proc. 29th ACM SIGPLAN Conf. Program. Lang. Des. Implementation, New York, NY, USA, 2008, pp. 206–215.
[102] P. Godefroid and J. Kinder, “Proving memory safety of floating-point
computations by combining static and dynamic program analysis,” in
Proc. 19th Int. Symp. Softw. Test. Anal., Trento, Italy, Jul. 12–16, 2010,
pp. 1–12.
[103] Z. Fu and Z. Su, “Achieving high coverage for floating-point code via
unconstrained programming,” in Proc. 38th ACM SIGPLAN Conf. Program. Lang. Des. Implementation, New York, NY, USA, 2017, pp. 306–
319.
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
[104] B. Yadegari and S. Debray, “Bit-level taint analysis,” in Proc. 14th
IEEE Int. Working Conf. Source Code Anal. Manipulation, Victoria,
BC, Canada, Sep. 28–29, 2014, pp. 255–264.
[105] E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wanted
to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask),” in Proc. 31st IEEE Symp.
Security Privacy, Berleley/Oakland, CA, USA, May 16–19, 2010,
pp. 317–331.
[106] A. Arcuri, M. Z. Iqbal, and L. Briand, “Random testing: Theoretical
results and practical implications,” IEEE Trans. Softw. Eng., vol. 38,
no. 2, pp. 258–277, Mar. 2012.
[107] E. Bounimova, P. Godefroid, and D. A. Molnar, “Billions and billions of
constraints: Whitebox fuzz testing in production,” in Proc. 35th Int. Conf.
Softw. Eng., San Francisco, CA, USA, May 18–26, 2013, pp. 122–131.
[108] S. Rawat, V. Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos,
“VUzzer: Application-aware evolutionary fuzzing,” in Proc. 24th Annu.
Netw. Distrib. Syst. Security Symp., San Diego, CA, USA, Feb. 26–Mar.
1, 2017.
[109] S. Person, G. Yang, N. Rungta, and S. Khurshid, “Directed incremental
symbolic execution,” in Proc. 32nd ACM SIGPLAN Conf. Program.
Lang. Des. Implementation, San Jose, CA, USA, Jun. 4–8, 2011, pp. 504–
515.
[110] C.-J. M. Liang et al., “Caiipa: Automated large-scale mobile app testing through contextual fuzzing,” in Proc. 20th Annu. Int. Conf. Mobile
Comput. Netw., 2014, pp. 519–530.
[111] L. W. Hao, M. S. Ramanujam, and S. P. T. Krishnan, “On designing an efficient distributed black-box fuzzing system for mobile devices,” in Proc. 10th ACM Symp. Inf. Comput. Commun. Security, 2015,
pp. 31–42.
[112] Peach Fuzzer: Discover Unknown Vulnerabilities, Peach, Peach Fuzzer.
[Online]. Available: http://www.peachfuzzer.com/. Accessed on: Jul. 13,
2016.
[113] H. Dantas, Z. Erkin, C. Doerr, R. Hallie, and G. van der Bij, “eFuzz:
A fuzzer for DLMS/COSEM electricity meters,” in Proc. 2nd Workshop
Smart Energy Grid Security, Scottsdale, AZ, USA, 2014, pp. 31–38.
[114] Honggfuzz by Google, Honggfuzz. [Online]. Available: https://google.
github.io/honggfuzz/. Accessed on: Jul. 13, 2016.
[115] Dynamic Testing (Fuzzing) on the ISASecure EDSA Certification 402 Ethernet by beSTORM, beSTORM. [Online]. Available:
http://www.beyondsecurity.com/dynamic_fuzzing_testing_embedded_
device_security_assurance_402_ethernet. Accessed on: Jul. 19, 2016.
[116] MozillaSecurity/funfuzz, jsfunfuzz, GitHub. [Online]. Available:
https://github.com/MozillaSecurity/funfuzz. Accessed on: Dec. 16,
2016.
[117] X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and understanding bugs in C compilers,” in Proc. 32nd ACM SIGPLAN Conf.
Program. Lang. Des. Implementation, New York, NY, USA, 2011,
pp. 283–294.
[118] W. M. McKeeman, “Differential testing for software,” Digit. Tech. J.,
vol. 10, no. 1, pp. 100–107, 1998.
[119] Csmith. [Online]. Available: https://embed.cs.utah.edu/csmith/. Accessed on: Dec. 16, 2016.
[120] C. Holler, K. Herzig, and A. Zeller, “Fuzzing with code fragments,” in
Proc. 21th USENIX Security Symp., Bellevue, WA, USA, Aug. 8–10,
2012, pp. 445–458.
[121] R. Guo, “MongoDB’s JavaScript fuzzer,” Commun. ACM, vol. 60, no. 5,
pp. 43–47, 2017.
[122] S. Veggalam, S. Rawat, I. Haller, and H. Bos, “IFuzzer: An evolutionary
interpreter fuzzer using genetic programming,” in Proc. Eur. Symp. Res.
Comput. Security, vol. 9878, 2016, pp. 581–601.
[123] Project Springfield, Springfield. [Online]. Available: https://www.
microsoft.com/en-us/springfield/. Accessed on: Apr. 15, 2017.
[124] N. Stephens et al., “Driller: Augmenting fuzzing through selective symbolic execution,” in Proc. 23nd Annu. Netw. Distrib. Syst. Security Symp.,
San Diego, CA, USA, Feb. 21–24, 2016, pp. 1–16.
[125] Project Triforce: Run AFL on Everything!, TriforceAFL. [Online]. Available:
https://www.nccgroup.trust/us/about-us/newsroom-and-events/
blog/2016/june/project-triforce-run-afl-on-everything/. Accessed on:
Jul. 13, 2016.
[126] ivanfratric/winafl, WinAFL, GitHub. [Online]. Available: https://github.
com/ivanfratric/winafl. Accessed on: Dec. 16, 2016.
[127] AFL Filesystem Fuzzing, Vault 2016_0.pdf, Oracle Linux and VM
Development, 2016. [Online]. Available: http://events.linuxfoundation.
org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20
Vault%202016_0.pdf. Accessed on: Jul. 13, 2016.
LIANG et al.: FUZZING: STATE OF THE ART
[128] G. Grieco, M. Ceresa, and P. Buiras, “QuickFuzz: An automatic random
fuzzer for common file formats,” in Proc. Int. Symp. Haskell, 2016,
pp. 13–20.
[129] J. Davis, A. Thekumparampil, and D. Lee, “Node.Fz: Fuzzing the serverside event-driven architecture,” in Proc. 12th Eur. Conf. Comput. Syst.,
New York, NY, USA, 2017, pp. 145–160.
[130] M. Marhefka and P. Müller, “Dfuzzer: A D-bus service fuzzing tool,” in
Proc. IEEE 7th Int. Conf. Softw. Test. Verification Validation Workshops,
2014, pp. 383–389.
[131] A. Joseph, “Droid-FF: The first android fuzzing framework,” in Proc.
Hack Box Security Conf., Amsterdam, The Netherlands, 2016. [Online]. Available: http://conference.hitb.org/hitbsecconf2016ams/sessions/
hitb-lab-droid-ff-the-first-android-fuzzing-framework/
[132] H. Shahriar, S. North, and E. Mawangi, “Testing of memory leak in
android applications,” in Proc. IEEE 15th Int. Symp. High-Assurance
Syst. Eng., 2014, pp. 176–183.
[133] H. Ye, S. Cheng, L. Zhang, and F. Jiang, “DroidFuzzer: Fuzzing the android apps with intent-filter tag,” in Proc. Int. Conf. Adv. Mobile Comput.
Multimed., 2013, pp. 1–7.
[134] R. Sasnauskas and J. Regehr, “Intent fuzzer: Crafting intents of death,”
in Proc. Joint Int. Workshop Dyn. Anal. Softw. Syst. Perform. Test.
Debugging Anal., 2014, pp. 1–5.
[135] D. Amalfitano, N. Amatucci, A. R. Fasolino, P. Tramontana, E.
Kowalczyk, and A. M. Memon, “Exploiting the saturation effect in automatic random testing of android applications,” in Proc. 2nd ACM Int.
Conf. Mobile Softw. Eng. Syst., 2015, pp. 33–43.
[136] OpenRCE/sulley, Sulley, GitHub. [Online]. Available: https://github.
com/OpenRCE/sulley. Accessed on: Jul. 12, 2016.
[137] jtpereyda/boofuzz, Boofuzz, GitHub. [Online]. Available: https://github.
com/jtpereyda/boofuzz. Accessed on: Jul. 23, 2016.
[138] J. Somorovsky, “Systematic fuzzing and testing of TLS libraries,” in
Proc. 2016 ACM SIGSAC Conf. Comput. Commun. Security, New York,
NY, USA, 2016, pp. 1492–1504.
[139] D. Aitel, “The advantages of block-based protocol analysis for security
testing,” Immunity Inc., vol. 105, pp. 349–352, 2002.
[140] T. Rontti, A. M. Juuso, and A. Takanen, “Preventing DoS attacks in NGN
networks with proactive specification-based fuzzing,” IEEE Commun.
Mag., vol. 50, no. 9, pp. 164–170, Sep. 2012.
[141] W. Johansson, M. Svensson, U. E. Larson, M. Almgren, and V. Gulisano,
“T-Fuzz: Model-based fuzzing for robustness testing of telecommunication protocols,” in Proc. IEEE 7th Int. Conf. Softw. Test. Verification
Validation, 2014, pp. 323–332.
[142] P. Tsankov, M. T. Dashti, and D. Basin, “SecFuzz: Fuzz-testing security protocols,” in Proc. 7th Int. Workshop Autom. Softw. Test, Zurich,
Switzerland, 2012, pp. 1–7.
[143] G. Banks, M. Cova, V. Felmetsger, K. C. Almeroth, R. A. Kemmerer, and G. Vigna, “SNOOZE: Toward a Stateful NetwOrk prOtocol fuzZEr,” in Proc. Int. Conf. Inf. Security, vol. 4176, 2006, pp. 343–
358.
[144] H. J. Abdelnur, R. State, and O. Festor, “KiF: A stateful SIP fuzzer,”
in Proc. 1st Int. Conf. Principles Syst. Appl. IP Telecommun., 2007,
pp. 47–56.
[145] H. J. Abdelnur, R. State, and O. Festor, “Advanced fuzzing in the VoIP
space,” J. Comput. Virol., vol. 6, no. 1, pp. 57–64, 2010.
[146] A. Prakash, E. Venkataramani, H. Yin, and Z. Lin, “Manipulating semantic values in kernel data structures: Attack assessments and implications,”
in Proc. 43rd Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw., 2013,
pp. 1–12.
[147] D. Vyukov, Trinity: A Linux System Call Fuzzer, 2016. [Online]. Available: http://codemonkey.org.uk/projects/trinity/. Accessed on: Jul. 12,
2016.
[148] GitHub—Cr4sh/ioctlfuzzer: Automatically Exported From code.google.
com/p/ioctlfuzzer, IOCTL. [Online]. Available: https://github.com/
Cr4sh/ioctlfuzzer. Accessed on: Jul. 13, 2016.
[149] S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. Holz,
“kAFL: Hardware-assisted feedback fuzzing for OS kernels,” in Proc.
26th USENIX Security Symp., Vancouver, BC, Canada, 2017, pp. 167–
182.
[150] S. Y. Kim et al., “CAB-Fuzz: Practical concolic testing techniques for
COTS operating systems,” in Proc. USENIX Annu. Tech. Conf., Santa
Clara, CA, USA, 2017, pp. 689–701.
[151] vUSBf–QEMU/KEMU USB-Fuzzing Framework, hucktech, Firmware
Security, Feb. 8, 2016.
1217
[152] R. van Tonder and H. A. Engelbrecht, “Lowering the USB fuzzing barrier by transparent two-way emulation,” in Proc. USENIX Workshop
Offensive Technol., 2014, pp. 1–8.
[153] A. Henderson, H. Yin, G. Jin, H. Han, and H. Deng, “VDF: Targeted evolutionary fuzz testing of virtual devices,” in Proc. 20th Int.
Symp. Res. Attacks Intrusions Defenses, Atlanta, GA, USA, 2017,
pp. 3–25.
[154] perf_fuzzer perf_event syscall fuzzer, perf_fuzzer. [Online]. Available: http://web.eece.maine.edu/∼vweaver/projects/perf_events/fuzzer/.
Accessed on: Jul. 13, 2016.
[155] libFuzzer—A Library for Coverage-Guided Fuzz Testing. LLVM 3.9
Documentation, libFuzzer. [Online]. Available: http://www.llvm.org/
docs/LibFuzzer.html. Accessed on: Jul. 13, 2016.
[156] A. G. Voyiatzis, K. Katsigiannis, and S. Koubias, “A modbus/TCP fuzzer
for testing internetworked industrial systems,” in Proc. IEEE 20th Conf.
Emerg. Technol. Factory Autom., 2015, pp. 1–6.
[157] F. L. Sang, V. Nicomette, and Y. Deswarte, “A tool to analyze potential
I/O attacks against PCs,” IEEE Security Privacy, vol. 12, no. 2, pp. 60–66,
Mar. 2014.
[158] B. Chandrasekar, B. Ramesh, V. Prabhu, S. Sajeev, P. K. Mohanty, and
G. Shobha, “Development of intelligent digital certificate fuzzer tool,”
in Proc. Int. Conf. Cryptogr. Security Privacy, Wuhan, China, 2017,
pp. 126–130.
[159] A. Case, A. K. Das, S.-J. Park, J. R. Ramanujam, and G. G. Richard III,
“Gaslight: A comprehensive fuzzing architecture for memory forensics
frameworks,” Digit. Investigation, vol. 22, pp. S86–S93, 2017.
[160] E. Stepanov and K. Serebryany, “MemorySanitizer: Fast detector of
uninitialized memory use in C++,” in Proc. 13th Annu. IEEE/ACM Int.
Symp. Code Gener. Optim., San Francisco, CA, USA, Feb. 7–11, 2015,
pp. 46–55.
[161] E. Poll, J. D. Ruiter, and A. Schubert, “Protocol state machines and
session languages: Specification, implementation, and security flaws,” in
Proc. IEEE Security Privacy Workshops, 2015, pp. 125–133.
[162] S. Rawat and L. Mounier, “An evolutionary computing approach for
hunting buffer overflow vulnerabilities: A case of aiming in dim light,”
in Proc. Eur. Conf. Comput. Netw. Defense, 2010, pp. 37–45.
[163] R. B. Dannenberg et al., “As-If infinitely ranged integer model,” in Proc.
IEEE 21st Int. Symp. Softw. Rel. Eng., 2010, pp. 91–100.
[164] T. Wang, T. Wei, Z. Lin, and W. Zou, “IntScope: Automatically detecting
integer overflow vulnerability in X86 binary using symbolic execution,”
in Proc. Netw. Distrib. Syst. Security Symp., 2009, pp. 1–14.
[165] S. Palka and D. McCoy, “Fuzzing E-mail filters with generative grammars and n-gram analysis,” in Proc. Workshop Offensive Technol., 2015,
pp. 1–10.
[166] M. Mouzarani, B. Sadeghiyan, and M. Zolfaghari, “A smart fuzzing
method for detecting heap-based buffer overflow in executable codes,”
in Proc. IEEE 21st Pacific Rim Int. Symp. Dependable Comput., 2015,
pp. 42–49.
[167] C. C. Yeh, H. Chung, and S. K. Huang, “CRAXfuzz: Target-aware symbolic fuzz testing,” in Proc. IEEE 39th Annu. Comput. Softw. Appl. Conf.,
vol. 2, 2015, pp. 460–471.
[168] S. K. Huang, M. H. Huang, P. Y. Huang, H. L. Lu, and C. W. Lai, “Software crash analysis for automatic exploit generation on binary programs,”
IEEE Trans. Rel., vol. 63, no. 1, pp. 270–289, Mar. 2014.
[169] B. Dolan-Gavitt et al., “LAVA: Large-scale automated vulnerability addition,” in Proc. IEEE Symp. Security Privacy, 2016, pp. 110–121.
[170] D. Zhang et al., “SimFuzz: Test case similarity directed deep fuzzing,”
J. Syst. Softw., vol. 85, no. 1, pp. 102–111, 2012.
[171] W. M. McKeeman, “Differential testing for software,” Digit. Tech. J.,
vol. 10, no. 1, pp. 100–107, 1998.
[172] S. Kyle, H. Leather, B. Franke, D. Butcher, and S. Monteith, “Application
of domain-aware binary fuzzing to aid android virtual machine testing,”
in Proc. 11th ACM SIGPLAN/SIGOPS Int. Conf. Virtual Execution Environ., New York, NY, USA, 2015, pp. 121–132.
[173] T. Y. Chen, F.-C. Kuo, H. Liu, and W. E. Wong, “Code coverage of
adaptive random testing,” IEEE Trans. Rel., vol. 62, no. 1, pp. 226–237,
Mar. 2013.
[174] T. Y. Chen, F.-C. Kuo, and H. Liu, “Application of a failure driven test
profile in random testing,” IEEE Trans. Rel., vol. 58, no. 1, pp. 179–192,
Mar. 2009.
[175] A. F. Tappenden and J. Miller, “A novel evolutionary approach for adaptive random testing,” IEEE Trans. Rel., vol. 58, no. 4, pp. 619–633, Dec.
2009.
1218
IEEE TRANSACTIONS ON RELIABILITY, VOL. 67, NO. 3, SEPTEMBER 2018
[176] E. Rogstad and L. C. Briand, “Clustering deviations for black box regression testing of database applications,” IEEE Trans. Rel., vol. 65, no. 1,
pp. 4–18, Mar. 2016.
[177] Y. Chen, T. Su, C. Sun, Z. Su, and J. Zhao, “Coverage-directed differential testing of JVM implementations,” in Proc. 37th ACM SIGPLAN
Conf. Program. Lang. Des. Implementation, New York, NY, USA, 2016,
pp. 85–99.
[178] B. Shastry et al., “Static program analysis as a fuzzing aid,” in Proc. 20th
Int. Symp. Res. Attacks Intrusions Defenses, Atlanta, GA, USA, 2017,
pp. 26–47.
[179] V.-T. Pham, M. Böhme, and A. Roychoudhury, “Model-based whitebox
fuzzing for program binaries,” in Proc. 31st IEEE/ACM Int. Conf. Autom.
Softw. Eng., New York, NY, USA, 2016, pp. 543–553.
[180] T. Y. Chen et al., “Metamorphic testing for cybersecurity,” Computer,
vol. 49, no. 6, pp. 48–55, Jun. 2016.
[181] T. Y. Chen, T. H. Tse, and Z. Zhou, “Fault-based testing without the need
of oracles,” Inf. Softw. Technol., vol. 45, no. 1, pp. 1–9, 2003.
[182] H. Liu, F.-C. Kuo, D. Towey, and T. Y. Chen, “How effectively does
metamorphic testing alleviate the oracle problem?” IEEE Trans. Softw.
Eng., vol. 40, no. 1, pp. 4–22, Jan. 2014.
[183] T. Y. Chen et al., “Metamorphic testing: A review of challenges and
opportunities,” ACM Comput. Surv., vol. 51, no. 1, pp. 4:1–4:27, Jan.
2018.
[184] B. Garn and D. E. Simos, “Eris: A tool for combinatorial testing of the
linux system call interface,” in Proc. IEEE 7th Int. Conf. Softw. Test.
Verification Validation Workshops, 2014, pp. 58–67.
[185] B. Shastry, F. Maggi, F. Yamaguchi, K. Rieck, and J.-P. Seifert,
“Static exploration of taint-style vulnerabilities found by fuzzing,”
in Proc. 11th USENIX Workshop Offensive Technol., Vancouver, BC,
Canada, 2017.
[186] L. Ma, C. Artho, C. Zhang, H. Sato, J. Gmeiner, and R. Ramler, “GRT:
Program-analysis-guided random testing (T),” in Proc. 30th IEEE/ACM
Int. Conf. Autom. Softw. Eng., Lincoln, NE, USA, vol. 2015, Nov. 9–13,
2015, pp. 212–223.
[187] E. Alatawi, T. Miller, and H. Søndergaard, “Using metamorphic testing
to improve dynamic symbolic execution,” in Proc. 24th Australas. Softw.
Eng. Conf., 2015, pp. 38–47.
[188] N. Havrikov, “Efficient fuzz testing leveraging input, code, and execution,” in Proc. 39th Int. Conf. Softw. Eng., Buenos Aires, Argentina,
vol. 2017, May 20–28, 2017, pp. 417–420.
[189] P. Godefroid, H. Peleg, and R. Singh, “Learn&Fuzz: Machine learning
for input fuzzing,” in Proc. 32nd IEEE/ACM Int. Conf. Autom. Softw.
Eng., Urbana, IL, USA, Oct. 30–Nov. 3, 2017, pp. 50–59.
[190] K. Böttinger, “Fuzzing binaries with Lévy flight swarms,” EURASIP J.
Inf. Security, vol. 2016, no. 1, Nov. 2016, Art. no. 28.
[191] K. Böttinger, “Hunting bugs with levy flight foraging,” in Proc. IEEE
Security Privacy Workshops, 2016, pp. 111–117.
[192] F. Duchene, S. Rawat, J.-L. Richier, and R. Groz, “KameleonFuzz: Evolutionary fuzzing for black-box XSS detection,” in Proc. ACM Conf.
Data Appl. Security Privacy, 2014, pp. 37–48.
[193] F. Duchene, R. Groz, S. Rawat, and J. L. Richier, “XSS vulnerability
detection using model inference assisted evolutionary fuzzing,” in Proc.
IEEE 5th Int. Conf. Softw. Test. Verification Validation, 2012, pp. 815–
817.
[194] S. Dai, A. Tongaonkar, X. Wang, A. Nucci, and D. Song, “NetworkProfiler: Towards automatic fingerprinting of Android apps,” in Proc. IEEE
Int. Conf. Comput. Commun., 2013, pp. 809–817.
Hongliang Liang (M’14) received the Ph.D. degree
in computer science from the University of Chinese
Academy of Sciences, Beijing, China, in 2002.
He is currently an Associate Professor with
Beijing University of Posts and Telecommunications,
Beijing, China. His research interests include system
software, program analysis, software security, and artificial intelligence.
Dr. Liang is a senior member of China Computer
Federation.
Xiaoxiao Pei received the M.Sc. degree in computer
science from the Beijing University of Posts and
Telecommunications, Beijing, China, in 2018.
Her research interests include fuzzing and symbolic execution.
Xiaodong Jia received the M.Sc. degree in computer science from the Beijing University of Posts
and Telecommunications, Beijing, China, in 2018.
Her research interests include taint analysis and
Android security.
Wuwei Shen (M’00) received the Ph.D. degree in
computer science from the Department of Electrical Engineering and Computer Science, University
of Michigan, Ann Arbor, MI, USA, in 2001.
He is currently an Associate Professor with Western Michigan University, Kalamazoo, MI, USA. His
research interests include object-oriented analysis
and design modeling, model consistency checking,
model-based software testing, assurance-based software development, and software certification.
Dr. Shen received the research faculty fellowship
from the Air Force Research Lab, in 2015 and 2016. He was the recipient of
a senior research award from the U.S. National Research Council Research
Associateship Programs to work on the assurance-based software development
for mission critical systems in the AFRL, Rome, NY, USA, in 2017 and 2018.
Jian Zhang (SM’09) is a Research Professor with
the Institute of Software, Chinese Academy of Sciences, Beijing, China. His main research interests
include automated reasoning, constraint satisfaction,
program analysis, and software testing.
He has served on the program committees of
some 70 international conferences. He also serves
on the editorial boards of several journals, including
the IEEE TRANSACTIONS ON RELIABILITY, Frontiers
of Computer Science, and the Journal of Computer
Science and Technology. He is a senior member of
ACM and is a distinguished member of the China Computer Federation.