Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2771783.2771818acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Experience report: how is dynamic symbolic execution different from manual testing? a study on KLEE

Published: 13 July 2015 Publication History

Abstract

Software testing has been the major approach to software quality assurance for decades, but it typically involves intensive manual efforts. To reduce manual efforts, researchers have proposed numerous approaches to automate test-case generation, which is one of the most time-consuming tasks in software testing. One most recent achievement in the area is Dynamic Symbolic Execution (DSE), and tools based on DSE, such as KLEE, have been reported to generate test suites achieving higher code coverage than manually developed test suites. However, besides the competitive code coverage, there have been few studies to compare DSE-based test suites with manually developed test suites more thoroughly on various metrics to understand the detailed differences between the two testing methodologies. In this paper, we revisit the experimental study on the KLEE tool and GNU CoreUtils programs, and compare KLEE-based test suites with manually developed test suites on various aspects. We further carried out a qualitative study to investigates the reasons behind the differences in statistical results. The results of our studies show that while KLEE-based test suites are able to generate test cases with higher code coverage, they are relatively less effective on covering hard-to-cover code and killing mutants. Furthermore, our qualitative study reveals that KLEE-based test suites have advantages in exploring error-handling code and exhausting options, but are less effective on generating valid string inputs and exploring meaningful program behaviors.

References

[1]
Codesurfer, http://www.grammatech.com/research/technologies/codesurfer.
[2]
Gnu gcov for gcc, http://gcc.gnu.org/onlinedocs/gcc/gcov.html.
[3]
J. Andrews, L. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? {software testing}. In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on, pages 402–411, May 2005.
[4]
J. Andrews, L. Briand, Y. Labiche, and A. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. Software Engineering, IEEE Transactions on, 32(8):608–624, Aug 2006.
[5]
J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. Software Engineering, IEEE Transactions on, 32(8):608–624, 2006.
[6]
N. Bjørner, N. Tillmann, and A. Voronkov. Path feasibility analysis for string-manipulating programs. In TACAS, pages 307–321, 2009.
[7]
C. Cadar, D. Dunbar, and D. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 209–224, 2008.
[8]
M. Ceccato, A. Marchetto, L. Mariani, C. D. Nguyen, and P. Tonella. An empirical study about the effectiveness of debugging when random test cases are used. In Proceedings of the 34th International Conference on Software Engineering, pages 452–462, 2012.
[9]
T. Chen. Adaptive random testing. In Quality Software, 2008. QSIC ’08. The Eighth International Conference on, pages 443–443, Aug 2008.
[10]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst., 13(4):451–490, 1991.
[11]
A. C. Dias Neto, R. Subramanyan, M. Vieira, and G. H. Travassos. A survey on model-based testing approaches: A systematic review. In Proceedings of the 1st ACM International Workshop on Empirical Assessment of Software Engineering Languages and Technologies: Held in Conjunction with the 22Nd IEEE/ACM International Conference on automated Software Engineering (ASE) 2007, WEASELTech ’07, pages 31–36, 2007.
[12]
H. Do and G. Rothermel. On the use of mutation faults in empirical assessments of test case prioritization techniques. Software Engineering, IEEE Transactions on, 32(9):733–752, 2006.
[13]
P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In ACM SIGSOFT Software Engineering Notes, volume 23, pages 153–162, 1998.
[14]
G. Fraser and A. Arcuri. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 416–419, 2011.
[15]
G. Fraser and A. Arcuri. Sound empirical evidence in software testing. In Proceedings of the 2012 International Conference on Software Engineering, pages 178–188, 2012.
[16]
G. Fraser, M. Staats, P. McMinn, A. Arcuri, and F. Padberg. Does automated white-box test generation really help software testers? In Proceedings of the 2013 International Symposium on Software Testing and Analysis, pages 291–301, 2013.
[17]
G. Fraser and A. Zeller. Mutation-driven generation of unit tests and oracles. Software Engineering, IEEE Transactions on, 38(2):278–292, 2012.
[18]
M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, pages 302–313. ACM, 2013.
[19]
P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 206–215, 2008.
[20]
P. Godefroid, N. Klarlund, and K. Sen. Dart: directed automated random testing. In ACM Sigplan Notices, volume 40, pages 213–223, 2005.
[21]
M. Greiler, A. van Deursen, and M. Storey. Test confessions: a study of testing practices for plug-in systems. In Software Engineering (ICSE), 2012 34th International Conference on, pages 244–254, 2012.
[22]
W. Howden. Weak mutation testing and completeness of test sets. Software Engineering, IEEE Transactions on, SE-8(4):371–379, July 1982.
[23]
L. Inozemtseva and R. Holmes. Coverage is not strongly correlated with test suite effectiveness. In Software Engineering (ICSE), 2014 36th International Conference on, page to appear, 2014.
[24]
A. Kiezun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. Hampi: a solver for string constraints. In ISSTA, pages 105–116, 2009.
[25]
J. Kracht, J. Petrovic, and K. Walcott-Justice. Empirically evaluating the quality of automatically generated and manually written test suites. In Quality Software (QSIC), 2014 14th International Conference on, pages 256–265, 2014.
[26]
G. Li and I. Ghosh. Pass: String solving with parameterized array and interval automaton. In Hardware and Software: Verification and Testing, pages 15–31. 2013.
[27]
Y. Li, Z. Su, L. Wang, and X. Li. Steering symbolic execution to less traveled paths. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications, pages 19––32. ACM, 2013.
[28]
R. Majumdar and R.-G. Xu. Directed test generation using symbolic grammars. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, pages 134–143, 2007.
[29]
S. Mostafa and X. Wang. An empirical study on the usage of mocking frameworks in software testing. In 14th International Conference on Quality Software, pages 127–132, 2014.
[30]
A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 57–68, 2009.
[31]
C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Proceedings of the 29th International Conference on Software Engineering, ICSE ’07, pages 75–84, 2007.
[32]
R. Pham, L. Singer, O. Liskin, K. Schneider, et al. Creating a shared understanding of testing culture on a social coding site. In Software Engineering (ICSE), 2013 35th International Conference on, pages 112–121, 2013.
[33]
G. Rothermel, R. Untch, C. Chu, and M. Harrold. Prioritizing test cases for regression testing. Software Engineering, IEEE Transactions on, 27(10):929–948, Oct 2001.
[34]
P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song. A symbolic execution framework for javascript. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pages 513–528, 2010.
[35]
I. Segall and R. Tzoref-Brill. Interactive refinement of combinatorial test plans. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 1371–1374, 2012.
[36]
K. Sen, D. Marinov, and G. Agha. Cute: A concolic unit testing engine for c. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 263–272, 2005.
[37]
S. Shahamiri, W. Kadir, and S. Mohd-Hashim. A comparative study on automated software test oracle methods. In Software Engineering Advances, 2009. ICSEA ’09. Fourth International Conference on, pages 140–145, 2009.
[38]
K. P. Singh, T. F. Bissyandé, D. Lo, L. Jiang, et al. An empirical study of adoption of software testing in open source projects. In Proceedings of the 13th International Conference on Quality Software (QSIC 2013), pages 1–10, 2013.
[39]
N. Tillmann and J. De Halleux. Pex–white box test generation for. net. In Tests and Proofs, pages 134–153. 2008.
[40]
T. Williams, M. Mercer, J. Mucha, and R. Kapur. Code coverage, what does it mean in terms of quality? In Reliability and Maintainability Symposium, 2001. Proceedings. Annual, pages 420–424, 2001.
[41]
X. Xiao, T. Xie, N. Tillmann, and J. de Halleux. Precise identification of problems for structural test generation. In ICSE, pages 611–620, 2011.
[42]
T. Xie, N. Tillmann, J. De Halleux, and W. Schulte. Fitness-guided path exploration in dynamic symbolic execution. In Dependable Systems Networks, 2009. DSN ’09. IEEE/IFIP International Conference on, pages 359–368, 2009.
[43]
L. Zhang, T. Xie, L. Zhang, N. Tillmann, J. De Halleux, and H. Mei. Test generation via dynamic symbolic execution for mutation testing. In Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1–10, 2010.

Cited By

View all
  • (2024)Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors in RustProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639714(441-451)Online publication date: 14-Apr-2024
  • (2024)A Review of the Applications of Heuristic Algorithms in Test Case Generation Problem2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00114(856-865)Online publication date: 1-Jul-2024
  • (2023)UnitTestBot: Automated Unit Test Generation for C Code in Integrated Development EnvironmentsProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00107(380-384)Online publication date: 14-May-2023
  • Show More Cited By

Index Terms

  1. Experience report: how is dynamic symbolic execution different from manual testing? a study on KLEE

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and Analysis
    July 2015
    447 pages
    ISBN:9781450336208
    DOI:10.1145/2771783
    • General Chair:
    • Michal Young,
    • Program Chair:
    • Tao Xie
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Dynamic Symbolic Execution
    2. Empirical Study
    3. Manual Testing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ISSTA '15
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors in RustProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639714(441-451)Online publication date: 14-Apr-2024
    • (2024)A Review of the Applications of Heuristic Algorithms in Test Case Generation Problem2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00114(856-865)Online publication date: 1-Jul-2024
    • (2023)UnitTestBot: Automated Unit Test Generation for C Code in Integrated Development EnvironmentsProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00107(380-384)Online publication date: 14-May-2023
    • (2022)Human-based Test Design versus Automated Test Generation: A Literature Review and Meta-AnalysisProceedings of the 15th Innovations in Software Engineering Conference10.1145/3511430.3511433(1-11)Online publication date: 24-Feb-2022
    • (2022)Characterizing and Improving Bug-Finders with Synthetic Bugs2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00115(971-982)Online publication date: Mar-2022
    • (2021)LeanSym: Efficient Hybrid Fuzzing Through Conservative Constraint DebloatingProceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3471621.3471852(62-77)Online publication date: 6-Oct-2021
    • (2021)Research on Optimization of Fuzzing Technology Based on Hybrid Execution2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE)10.1109/ICBAIE52039.2021.9389853(605-608)Online publication date: 26-Mar-2021
    • (2020)A Practical Concolic Execution Technique for Large Scale Software SystemsProceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering10.1145/3383219.3383254(312-317)Online publication date: 15-Apr-2020
    • (2019)TestMig: migrating GUI test cases from iOS to AndroidProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3293882.3330575(284-295)Online publication date: 10-Jul-2019
    • (2019)Classifying generated white-box tests: an exploratory studySoftware Quality Journal10.1007/s11219-019-09446-527:3(1339-1380)Online publication date: 17-May-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media