research-article

Experience report: how is dynamic symbolic execution different from manual testing? a study on KLEE

Authors:

Lingming Zhang,

Philip TanofskyAuthors Info & Claims

ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and Analysis

Pages 199 - 210

https://doi.org/10.1145/2771783.2771818

Published: 13 July 2015 Publication History

Abstract

Software testing has been the major approach to software quality assurance for decades, but it typically involves intensive manual efforts. To reduce manual efforts, researchers have proposed numerous approaches to automate test-case generation, which is one of the most time-consuming tasks in software testing. One most recent achievement in the area is Dynamic Symbolic Execution (DSE), and tools based on DSE, such as KLEE, have been reported to generate test suites achieving higher code coverage than manually developed test suites. However, besides the competitive code coverage, there have been few studies to compare DSE-based test suites with manually developed test suites more thoroughly on various metrics to understand the detailed differences between the two testing methodologies. In this paper, we revisit the experimental study on the KLEE tool and GNU CoreUtils programs, and compare KLEE-based test suites with manually developed test suites on various aspects. We further carried out a qualitative study to investigates the reasons behind the differences in statistical results. The results of our studies show that while KLEE-based test suites are able to generate test cases with higher code coverage, they are relatively less effective on covering hard-to-cover code and killing mutants. Furthermore, our qualitative study reveals that KLEE-based test suites have advantages in exploring error-handling code and exhausting options, but are less effective on generating valid string inputs and exploring meaningful program behaviors.

References

[1]

Codesurfer, http://www.grammatech.com/research/technologies/codesurfer.

[2]

Gnu gcov for gcc, http://gcc.gnu.org/onlinedocs/gcc/gcov.html.

[3]

J. Andrews, L. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? {software testing}. In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on, pages 402–411, May 2005.

Digital Library

[4]

J. Andrews, L. Briand, Y. Labiche, and A. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. Software Engineering, IEEE Transactions on, 32(8):608–624, Aug 2006.

Digital Library

[5]

J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. Software Engineering, IEEE Transactions on, 32(8):608–624, 2006.

Digital Library

[6]

N. Bjørner, N. Tillmann, and A. Voronkov. Path feasibility analysis for string-manipulating programs. In TACAS, pages 307–321, 2009.

Digital Library

[7]

C. Cadar, D. Dunbar, and D. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 209–224, 2008.

Digital Library

[8]

M. Ceccato, A. Marchetto, L. Mariani, C. D. Nguyen, and P. Tonella. An empirical study about the effectiveness of debugging when random test cases are used. In Proceedings of the 34th International Conference on Software Engineering, pages 452–462, 2012.

Digital Library

[9]

T. Chen. Adaptive random testing. In Quality Software, 2008. QSIC ’08. The Eighth International Conference on, pages 443–443, Aug 2008.

Digital Library

[10]

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst., 13(4):451–490, 1991.

Digital Library

[11]

A. C. Dias Neto, R. Subramanyan, M. Vieira, and G. H. Travassos. A survey on model-based testing approaches: A systematic review. In Proceedings of the 1st ACM International Workshop on Empirical Assessment of Software Engineering Languages and Technologies: Held in Conjunction with the 22Nd IEEE/ACM International Conference on automated Software Engineering (ASE) 2007, WEASELTech ’07, pages 31–36, 2007.

Digital Library

[12]

H. Do and G. Rothermel. On the use of mutation faults in empirical assessments of test case prioritization techniques. Software Engineering, IEEE Transactions on, 32(9):733–752, 2006.

Digital Library

[13]

P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In ACM SIGSOFT Software Engineering Notes, volume 23, pages 153–162, 1998.

Digital Library

[14]

G. Fraser and A. Arcuri. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 416–419, 2011.

Digital Library

[15]

G. Fraser and A. Arcuri. Sound empirical evidence in software testing. In Proceedings of the 2012 International Conference on Software Engineering, pages 178–188, 2012.

Digital Library

[16]

G. Fraser, M. Staats, P. McMinn, A. Arcuri, and F. Padberg. Does automated white-box test generation really help software testers? In Proceedings of the 2013 International Symposium on Software Testing and Analysis, pages 291–301, 2013.

Digital Library

[17]

G. Fraser and A. Zeller. Mutation-driven generation of unit tests and oracles. Software Engineering, IEEE Transactions on, 38(2):278–292, 2012.

Digital Library

[18]

M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, pages 302–313. ACM, 2013.

Digital Library

[19]

P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 206–215, 2008.

Digital Library

[20]

P. Godefroid, N. Klarlund, and K. Sen. Dart: directed automated random testing. In ACM Sigplan Notices, volume 40, pages 213–223, 2005.

Digital Library

[21]

M. Greiler, A. van Deursen, and M. Storey. Test confessions: a study of testing practices for plug-in systems. In Software Engineering (ICSE), 2012 34th International Conference on, pages 244–254, 2012.

Digital Library

[22]

W. Howden. Weak mutation testing and completeness of test sets. Software Engineering, IEEE Transactions on, SE-8(4):371–379, July 1982.

Digital Library

[23]

L. Inozemtseva and R. Holmes. Coverage is not strongly correlated with test suite effectiveness. In Software Engineering (ICSE), 2014 36th International Conference on, page to appear, 2014.

Digital Library

[24]

A. Kiezun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. Hampi: a solver for string constraints. In ISSTA, pages 105–116, 2009.

Digital Library

[25]

J. Kracht, J. Petrovic, and K. Walcott-Justice. Empirically evaluating the quality of automatically generated and manually written test suites. In Quality Software (QSIC), 2014 14th International Conference on, pages 256–265, 2014.

Digital Library

[26]

G. Li and I. Ghosh. Pass: String solving with parameterized array and interval automaton. In Hardware and Software: Verification and Testing, pages 15–31. 2013.

[27]

Y. Li, Z. Su, L. Wang, and X. Li. Steering symbolic execution to less traveled paths. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications, pages 19––32. ACM, 2013.

Digital Library

[28]

R. Majumdar and R.-G. Xu. Directed test generation using symbolic grammars. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, pages 134–143, 2007.

Digital Library

[29]

S. Mostafa and X. Wang. An empirical study on the usage of mocking frameworks in software testing. In 14th International Conference on Quality Software, pages 127–132, 2014.

Digital Library

[30]

A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 57–68, 2009.

Digital Library

[31]

C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Proceedings of the 29th International Conference on Software Engineering, ICSE ’07, pages 75–84, 2007.

Digital Library

[32]

R. Pham, L. Singer, O. Liskin, K. Schneider, et al. Creating a shared understanding of testing culture on a social coding site. In Software Engineering (ICSE), 2013 35th International Conference on, pages 112–121, 2013.

Digital Library

[33]

G. Rothermel, R. Untch, C. Chu, and M. Harrold. Prioritizing test cases for regression testing. Software Engineering, IEEE Transactions on, 27(10):929–948, Oct 2001.

Digital Library

[34]

P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song. A symbolic execution framework for javascript. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pages 513–528, 2010.

Digital Library

[35]

I. Segall and R. Tzoref-Brill. Interactive refinement of combinatorial test plans. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 1371–1374, 2012.

Digital Library

[36]

K. Sen, D. Marinov, and G. Agha. Cute: A concolic unit testing engine for c. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 263–272, 2005.

Digital Library

[37]

S. Shahamiri, W. Kadir, and S. Mohd-Hashim. A comparative study on automated software test oracle methods. In Software Engineering Advances, 2009. ICSEA ’09. Fourth International Conference on, pages 140–145, 2009.

Digital Library

[38]

K. P. Singh, T. F. Bissyandé, D. Lo, L. Jiang, et al. An empirical study of adoption of software testing in open source projects. In Proceedings of the 13th International Conference on Quality Software (QSIC 2013), pages 1–10, 2013.

Digital Library

[39]

N. Tillmann and J. De Halleux. Pex–white box test generation for. net. In Tests and Proofs, pages 134–153. 2008.

Digital Library

[40]

T. Williams, M. Mercer, J. Mucha, and R. Kapur. Code coverage, what does it mean in terms of quality? In Reliability and Maintainability Symposium, 2001. Proceedings. Annual, pages 420–424, 2001.

[41]

X. Xiao, T. Xie, N. Tillmann, and J. de Halleux. Precise identification of problems for structural test generation. In ICSE, pages 611–620, 2011.

Digital Library

[42]

T. Xie, N. Tillmann, J. De Halleux, and W. Schulte. Fitness-guided path exploration in dynamic symbolic execution. In Dependable Systems Networks, 2009. DSN ’09. IEEE/IFIP International Conference on, pages 359–368, 2009.

[43]

L. Zhang, T. Xie, L. Zhang, N. Tillmann, J. De Halleux, and H. Mei. Test generation via dynamic symbolic execution for mutation testing. In Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1–10, 2010.

Digital Library

Cited By

Zhang YLi PDing YWang LWilliams DMeng NRoychoudhury APaiva AAbreu RStorey MAniche MNagappan N(2024)Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors in RustProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639714(441-451)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639477.3639714
Guan HLi DLi HZhao MWong W(2024)A Review of the Applications of Heuristic Algorithms in Test Case Generation Problem2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00114(856-865)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS-C63300.2024.00114
Ivanov DBabushkin AGrigoryev SIatchenii PKalugin VKichin EKulikov EMisonizhnik AMordvinov DMorozov SNaumenko OPleshakov APonomarev PShmidt SUtkin AVolodin VVolynets AGrundy J(2023)UnitTestBot: Automated Unit Test Generation for C Code in Integrated Development EnvironmentsProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00107(380-384)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE-Companion58688.2023.00107
Show More Cited By

Index Terms

Experience report: how is dynamic symbolic execution different from manual testing? a study on KLEE
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

An industrial experience report on the adoption of history-based test case prioritization
SAST '23: Proceedings of the 8th Brazilian Symposium on Systematic and Automated Software Testing

Many test case prioritization techniques have been proposed with the ultimate goal of speeding up failure detection. In particular, prioritizing based on history has proven to be an effective strategy. Examining the effectiveness of history-based ...
Towards Tool-Support for Test Case Selection in Manual Regression Testing
ICSTW '13: Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops

Manual regression testing can be a time-intensive and costly activity. Required efforts can be reduced by selecting only the tests for re-testing that verify actually modified system parts. However, if testers are not familiar with the system ...
Augmented testing to support manual GUI-based regression testing: An empirical study
AbstractContext
Manual graphical user interface (GUI) software testing presents a substantial part of the overall practiced testing efforts, despite various research efforts to further increase test automation. Augmented Testing (AT), a novel approach for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and Analysis

July 2015

447 pages

ISBN:9781450336208

DOI:10.1145/2771783

General Chair:
Michal Young
University of Oregon, USA
,
Program Chair:
Tao Xie
University of Illinois at Urbana-Champaign, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ISSTA '15

Sponsor:

SIGSOFT

ISSTA '15: International Symposium on Software Testing and Analysis

July 13 - 17, 2015

MD, Baltimore, USA

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
506
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)4

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YLi PDing YWang LWilliams DMeng NRoychoudhury APaiva AAbreu RStorey MAniche MNagappan N(2024)Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors in RustProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639714(441-451)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639477.3639714
Guan HLi DLi HZhao MWong W(2024)A Review of the Applications of Heuristic Algorithms in Test Case Generation Problem2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00114(856-865)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS-C63300.2024.00114
Ivanov DBabushkin AGrigoryev SIatchenii PKalugin VKichin EKulikov EMisonizhnik AMordvinov DMorozov SNaumenko OPleshakov APonomarev PShmidt SUtkin AVolodin VVolynets AGrundy J(2023)UnitTestBot: Automated Unit Test Generation for C Code in Integrated Development EnvironmentsProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00107(380-384)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE-Companion58688.2023.00107
Kurmaku TEnoiu EKumrija M(2022)Human-based Test Design versus Automated Test Generation: A Literature Review and Meta-AnalysisProceedings of the 15th Innovations in Software Engineering Conference10.1145/3511430.3511433(1-11)Online publication date: 24-Feb-2022
https://dl.acm.org/doi/10.1145/3511430.3511433
Hu YShen ZDolan-Gavitt B(2022)Characterizing and Improving Bug-Finders with Synthetic Bugs2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00115(971-982)Online publication date: Mar-2022
https://doi.org/10.1109/SANER53432.2022.00115
Mi XRawat SGiuffrida CBos HBilge LDumitras T(2021)LeanSym: Efficient Hybrid Fuzzing Through Conservative Constraint DebloatingProceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3471621.3471852(62-77)Online publication date: 6-Oct-2021
https://dl.acm.org/doi/10.1145/3471621.3471852
Zhang MWang XZhang R(2021)Research on Optimization of Fuzzing Technology Based on Hybrid Execution2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE)10.1109/ICBAIE52039.2021.9389853(605-608)Online publication date: 26-Mar-2021
https://doi.org/10.1109/ICBAIE52039.2021.9389853
Liang HYu WAi LJiang LLi JJaccheri LDingsøyr TChitchyan R(2020)A Practical Concolic Execution Technique for Large Scale Software SystemsProceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering10.1145/3383219.3383254(312-317)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3383219.3383254
Qin XZhong HWang XZhang DMøller A(2019)TestMig: migrating GUI test cases from iOS to AndroidProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3293882.3330575(284-295)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.1145/3293882.3330575
Honfi DMicskei Z(2019)Classifying generated white-box tests: an exploratory studySoftware Quality Journal10.1007/s11219-019-09446-527:3(1339-1380)Online publication date: 17-May-2019
https://doi.org/10.1007/s11219-019-09446-5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten