Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2884781.2884835acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

On the techniques we create, the tools we build, and their misalignments: a study of KLEE

Published: 14 May 2016 Publication History

Abstract

Our community constantly pushes the state-of-the-art by introducing "new" techniques. These techniques often build on top of, and are compared against, existing systems that realize previously published techniques. The underlying assumption is that existing systems correctly represent the techniques they implement. This paper examines that assumption through a study of KLEE, a popular and well-cited tool in our community. We briefly describe six improvements we made to KLEE, none of which can be considered "new" techniques, that provide order-of-magnitude performance gains. Given these improvements, we then investigate how the results and conclusions of a sample of papers that cite KLEE are affected. Our findings indicate that the strong emphasis on introducing "new" techniques may lead to wasted effort, missed opportunities for progress, an accretion of artifact complexity, and questionable research conclusions (in our study, 27% of the papers that depend on KLEE can be questioned). We conclude by revisiting initiatives that may help to realign the incentives to better support the foundations on which we build.

References

[1]
CAUT project, accessed: 2015-7-3.
[2]
CIVL project, accessed: 2015-7-3. vsl.cis.udel.edu/civl/.
[3]
CREST project, accessed: 2015-7-3. github.com/jburnim/crest/graphs/contributors.
[4]
The dacapo benchmark suite, accessed: 2015-7-31. dacapobench.org.
[5]
Google scholar. scholar.google.com. Accessed: 2015-6-4.
[6]
Google summer of code, accessed: 2015-7-3. developers.google.com/open-source/gsoc.
[7]
JCUTE project, accessed: 2015-7-3. github.com/osl/jcute/graphs/contributors.
[8]
KLEE LLVM execution engine website. klee.github.io/. Accessed: 2015-5-21.
[9]
KLEE project, accessed: 2015-7-3. github.com/klee/klee/graphs/contributors.
[10]
Software sustainability institute, accessed: 2015-7-3. software.ac.uk.
[11]
Soot project, accessed: 2015-8-25. github.com/Sable/soot/graphs/contributors.
[12]
SPF project, accessed: 2015-7-3. babelfish.arc.nasa.gov/hg/jpf/jpf-symbc.
[13]
S. Ahn and S. Malik. Modeling firmware as service functions and its application to test generation. In Hard. and Soft.: Verification and Testing, pages 61--77. Springer, 2013.
[14]
R. Bachwani, O. Crameri, R. Bianchini, D. Kostic, and W. Zwaenepoel. Sahara: Guiding the debugging of failed software upgrades. In ICSM, pages 263--272. IEEE, 2011.
[15]
A. Banerjee, S. Chattopadhyay, and A. Roychoudhury. Static analysis driven cache performance testing. In RTSS, pages 319--329. IEEE, 2013.
[16]
E. T. Barr, T. Vo, V. Le, and Z. Su. Automatic detection of floating-point exceptions. In SIGPLAN Notices, volume 48, pages 549--560. ACM, 2013.
[17]
C. Barrett, M. Deters, A. Oliveras, and A. Stump. Design and results of the satisfiability modulo theories competition. 2008.
[18]
S. Bauersfeld, T. E. Vos, and K. Lakhotia. Unit testing tool competitions--lessons learned. In Future Internet Testing, pages 75--94. Springer, 2014.
[19]
D. Beyer. Status report on software verification. In Tools and Algorithms for the Construction and Analysis of Systems, pages 373--388. Springer, 2014.
[20]
E. L. Boyer. Scholarship reconsidered: Priorities of the professoriate. Carnegie Foundation for the Advancement of Teaching, 1990.
[21]
A. Brooks, M. Roper, M. Wood, J. Daly, and J. Miller. Replication's role in software engineering. In Guide to adv. empirical soft. eng., pages 365--379. Springer, 2008.
[22]
S. Bucur, J. Kinder, and G. Candea. Prototyping symbolic execution engines for interpreted languages. In ASPLOS, pages 239--254. ACM, 2014.
[23]
C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209--224, 2008.
[24]
J. C. M. Carreira, R. Rodrigues, G. Candea, and R. Majumdar. Scalable testing of file system checkers. In EuroSys, pages 239--252. ACM, 2012.
[25]
S. Chattopadhyay, P. Eles, and Z. Peng. Automated software testing of memory performance in embedded GPUs. In EMSOFT, page 17. ACM, 2014.
[26]
K. Cong, F. Xie, and L. Lei. Automatic concolic test generation with virtual prototypes for post-silicon validation. In ICCAD, pages 303--310. IEEE, 2013.
[27]
X. Deng, J. Lee, and Robby. Bogor/kiasan: A k-bounded symbolic execution for checking strong heap properties of open systems. In ASE, pages 157--166, 2006.
[28]
P. Dinges and G. Agha. Solving complex path conditions through heuristic search on induced polytopes. In FSE, volume 14, 2014.
[29]
H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering: An International Journal, 10(4):405--435, 2005.
[30]
M. Dobrescu and K. Argyraki. Software dataplane verification. In NSDI, pages 101--114, 2014.
[31]
N. Eén and N. Sörensson. An extensible SAT-solver. In Theory and applications of satisfiability testing, pages 502--518. Springer, 2004.
[32]
M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao. The daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69(1):35--45, 2007.
[33]
N. S. Evans, A. Benameur, and M. C. Elder. Large-scale evaluation of a vulnerability analysis framework. In CSET, pages 3--3. USENIX Association, 2014.
[34]
S. Falke, F. Merz, and C. Sinz. Extending the theory of arrays: memset, memcpy, and beyond. In Verified Soft.: Theories, Tools, Experiments, pages 108--128. Springer, 2014.
[35]
A. Filieri, C. S. Păsăreanu, and W. Visser. Reliability analysis in symbolic pathfinder. In ICSE, pages 622--631. IEEE Press, 2013.
[36]
V. Ganesh, A. Kieżun, S. Artzi, P. J. Guo, P. Hooimeijer, and M. Ernst. HAMPI: A string solver for testing, analysis and vulnerability detection. In CAV, pages 1--19. Springer, 2011.
[37]
J. Geldenhuys, M. B. Dwyer, and W. Visser. Probabilistic symbolic execution. In ISSTA, pages 166--176. ACM, 2012.
[38]
P. Godefroid, M. Y. Levin, and D. Molnar. SAGE: whitebox fuzzing for security testing. Queue, 10(1):20, 2012.
[39]
L. Hafer and A. E. Kirkpatrick. Assessing open source software as a scholarly contribution. CACM, 52(12):126--129, 2009.
[40]
J. Howison and J. Bullard. How is software visible in the scientific literature? Technical report, Univ. of Texas, 2015.
[41]
J. Howison and J. D. Herbsleb. Incentives and integration in scientific software production. In CSCW, pages 459--470. ACM, 2013.
[42]
S.-K. Huang, H.-L. Lu, W.-M. Leong, and H. Liu. Craxweb: Automatic web application testing and attack generation. In SERE, pages 208--217. IEEE, 2013.
[43]
J. P. Ioannidis. Why most published research findings are false. Chance, 18(4):40--47, 2005.
[44]
W. Jin and A. Orso. BugRedux: reproducing field failures for in-house debugging. In ICSE, pages 474--484. IEEE, 2012.
[45]
N. Juristo and O. S. Gómez. Replication of software engineering experiments. In Empirical software engineering and verification, pages 60--88. Springer, 2012.
[46]
S. Kaleeswaran, V. Tulsian, A. Kanade, and A. Orso. MintHint: automated synthesis of repair hints. In ICSE, pages 266--276. ACM, 2014.
[47]
D. Katz. Citation and attribution of digital products: Social and technological concerns. WSSSPE at SC, 2013.
[48]
D. Katz, S.-C. Choi, H. Lapp, K. Maheshwari, F. Löffler, M. Turk, M. Hanwell, N. Wilkins-Diehr, et al. Summary of the first workshop on sustainable software for science (WSSSPE): Practice and experiences. arXiv, 2014.
[49]
F. M. Kifetew, W. Jin, R. Tiella, A. Orso, and P. Tonella. Reproducing field failures for programs with complex grammar-based input. In ICST, pages 163--172. IEEE, 2014.
[50]
C. Killian, K. Nagaraj, S. Pervez, R. Braud, J. W. Anderson, and R. Jhala. Finding latent performance bugs in systems implementations. In FSE, pages 17--26. ACM, 2010.
[51]
M. Kim, Y. Kim, and G. Rothermel. A scalable distributed concolic testing approach: An empirical evaluation. In ICST, pages 340--349. IEEE, 2012.
[52]
Y. Kim, M. Kim, Y. J. Kim, and Y. Jang. Industrial application of concolic testing approach: A case study on libexif by using CREST-BV and KLEE. In ICSE, pages 1143--1152. IEEE, 2012.
[53]
M. Knepley, J. Brown, L. C. McInnes, and B. Smith. Accurately citing software and algorithms used in publications. Technical report, 785731, 2013.
[54]
S. Krishnamurthi and J. Vitek. The real software crisis: repeatability as a core value. CACM, 58(3):34--36, 2015.
[55]
T. Kuchta, C. Cadar, M. Castro, and M. Costa. Docovery: toward generic automatic document recovery. In ASE, pages 563--574. ACM, 2014.
[56]
M. Kuzniar, P. Peresini, M. Canini, D. Venzano, and D. Kostic. A soft way for openflow switch interoperability testing. In CoNEXT, pages 265--276. ACM, 2012.
[57]
G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: Concolic verification and test generation for gpus. In SIGPLAN Notices, volume 47, pages 215--224. ACM, 2012.
[58]
C. Lucas, S. Elbaum, and D. S. Rosenblum. Detecting problematic message sequences and frequencies in distributed systems. ACM SIGPLAN Notices, 47(10):915--926, 2012.
[59]
H. Ma, X. Ma, W. Liu, Z. Huang, D. Gao, and C. Jia. Control flow obfuscation using neural network to fight concolic testing. In SecureComm, pages 287--304, 2014.
[60]
K.-K. Ma, K. Phang, J. Foster, and M. Hicks. Directed symbolic execution. In Static Analysis, pages 95--111. Springer, 2011.
[61]
L. Madeyski and B. Kitchenham. Reproducible research--what, why and how. Technical report, WrUT, Report PRE W08/2015/P-02, 2015.
[62]
L. Martignoni, S. McCamant, P. Poosankam, D. Song, and P. Maniatis. Path-exploration lifting: Hi-fi tests for lo-fi emulators. ACM SIGARCH, 40(1):337--348, 2012.
[63]
N. McDonald and S. Goggins. Performance and participation in open source software on GitHub. In CHI, pages 139--144. ACM, 2013.
[64]
D. Patterson, L. Snyder, and J. Ullman. Evaluating computer scientists and engineers for promotion and tenure. Computing Research Association, 1999.
[65]
R. D. Peng. Reproducible research in computational science. Science (New York, Ny), 334(6060):1226, 2011.
[66]
G. Petiot, B. Botella, J. Julliand, N. Kosmatov, and J. Signoles. Instrumentation of annotated c programs for test generation. In SCAM, pages 105--114. IEEE, 2014.
[67]
T. Proebsting and A. M. Warren. Repeatability and benefaction in computer systems research. Technical report, Univ. of Arizona TR 14-04, 2015.
[68]
E. F. Rizzi. Discovery over application: A case study of misaligned incentives in software engineering. Master's thesis, University of Nebraska, Lincoln, 2015.
[69]
E. F. Rizzi, M. B. Dwyer, and S. Elbaum. Safely reducing the cost of unit level symbolic execution through read/write analysis. ACM SIGSOFT Soft. Eng. Notes, 39(1):1--5, 2014.
[70]
R. Sasnauskas, O. Landsiedel, M. H. Alizai, C. Weise, S. Kowalewski, and K. Wehrle. KleeNet: discovering insidious interaction bugs in wireless sensor networks before deployment. In IPSN, pages 186--196. ACM, 2010.
[71]
P. Schrammel, T. Melham, and D. Kroening. Chaining test cases for reactive system testing. In Testing Software and Systems, pages 133--148. Springer, 2013.
[72]
H. Seo and S. Kim. How we get there: a context-guided search strategy in concolic testing. In FSE, pages 413--424. ACM, 2014.
[73]
M. Shepperd, D. Bowes, and T. Hall. Researcher bias: The use of machine learning in software defect prediction. Soft. Eng., IEEE Trans. on, 40(6):603--616, 2014.
[74]
J. Siegmund, N. Siegmund, and S. Apel. Views on internal and external validity in empirical software engineering. In ICSE, 2015.
[75]
A. Slowinska, T. Stancescu, and H. Bos. Body armor for binaries: Preventing buffer overflows without recompilation. In USENIX ATC, pages 125--137, 2012.
[76]
C. Song, A. Porter, and J. S. Foster. itree: efficiently discovering high-coverage configurations using interaction trees. Soft. Eng., IEEE Trans. on, 40(3):251--265, 2014.
[77]
C. Sturton, R. Sinha, T. H. Dang, S. Jain, M. McCoyd, W. Y. Tan, P. Maniatis, S. A. Seshia, and D. Wagner. Symbolic software model validation. In MEMOCODE, pages 97--108. IEEE, 2013.
[78]
T. Su, Z. Fu, G. Pu, J. He, and Z. Su. Combining symbolic execution and model checking for data flow testing. In ICSE, volume 15, pages 654--665, 2015.
[79]
W. N. Sumner, T. Bao, and X. Zhang. Selecting peers for execution comparison. In ISSTA, pages 309--319, 2011.
[80]
R. Tartler, J. Sincero, C. Dietrich, W. Schröder-Preikschat, and D. Lohmann. Revealing and repairing configuration inconsistencies in large-scale system software. STTT, 14(5):531--551, 2012.
[81]
N. Tillmann and J. De Halleux. Pex--white box test generation for. net. In Tests and Proofs, pages 134--153. Springer, 2008.
[82]
W. Visser, J. Geldenhuys, and M. B. Dwyer. Green: reducing, reusing and recycling constraints in program analysis. In FSE, pages 1--11. ACM, 2012.
[83]
X. Wang, D. Lazar, N. Zeldovich, A. Chlipala, and Z. Tatlock. Jitk: a trustworthy in-kernel interpreter infrastructure. In OSDI, pages 33--47. USENIX, 2014.
[84]
Q. Yi, Z. Yang, S. Guo, C. Wang, J. Liu, and C. Zhao. Postconditioned symbolic execution. In ICST, pages 1--10. IEEE, 2015.
[85]
Q. Yi, Z. Yang, J. Liu, C. Zhao, and C. Wang. A synergistic analysis method for explaining failed regression tests. In ICSE, pages 257--267, 2015.
[86]
Y. Zhang, Z. Chen, J. Wang, W. Dong, and Z. Liu. Regular property guided dynamic symbolic execution. In ICSE, pages 643--653, 2015.

Cited By

View all
  • (2024)A Transferability Study of Interpolation-Based Hardware Model Checking for Software VerificationProceedings of the ACM on Software Engineering10.1145/36607971:FSE(2028-2050)Online publication date: 12-Jul-2024
  • (2024)DarthShader: Fuzzing WebGPU Shader Translators & CompilersProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690209(690-704)Online publication date: 2-Dec-2024
  • (2024)Concrete Constraint Guided Symbolic ExecutionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639078(1-12)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Reviews

Richard John Botting

Software maintenance research seems to gets no respect and it is a minor miracle that the academic publish-or-perish system produced the data in this paper. This data indicates that ignoring maintenance distorts some academic research. The authors took an open-source software tool (the KLEE test generator) and applied half-a-dozen fairly obvious bug fixes and tweaks. The resulting version performed ten times faster! They then looked at the corpus of 100 research papers that mention the tool. None mentioned the researchers' fixes. This does not surprise me because novelty is rewarded more than repair. Next, they tried to see if the published results would be changed if the maintenance was done first. Seventy-four papers referred to KLEE without modifying it. Twelve papers were robust enough to not need replication. In two papers, it was possible to replicate the research using the properly maintained code. Further, in six papers they could approximately duplicate it. This showed that in seven out of these eight cases, the conclusions would have been changed. In other words, doing maintenance before doing a novel change would have been a good idea. The paper makes some recommendations: reward the publication and review of artifacts as well as papers, add special tracks in conferences for maintenance, provide institutional support for maintaining software, and so on. The authors only studied KLEE. I suspect similar results hold elsewhere. It would be good if other groups could replicate the authors' methodology on other published code. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. replication
  2. research incentives
  3. research tools and infrastructure

Qualifiers

  • Research-article

Conference

ICSE '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Transferability Study of Interpolation-Based Hardware Model Checking for Software VerificationProceedings of the ACM on Software Engineering10.1145/36607971:FSE(2028-2050)Online publication date: 12-Jul-2024
  • (2024)DarthShader: Fuzzing WebGPU Shader Translators & CompilersProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690209(690-704)Online publication date: 2-Dec-2024
  • (2024)Concrete Constraint Guided Symbolic ExecutionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639078(1-12)Online publication date: 20-May-2024
  • (2024)Netfuzzlib: Adding First-Class Fuzzing Support to Network Protocol ImplementationsComputer Security – ESORICS 202410.1007/978-3-031-70890-9_4(65-84)Online publication date: 16-Sep-2024
  • (2023)Continuously Accelerating ResearchProceedings of the 45th International Conference on Software Engineering: New Ideas and Emerging Results10.1109/ICSE-NIER58687.2023.00028(123-128)Online publication date: 17-May-2023
  • (2023)UnitTestBot: Automated Unit Test Generation for C Code in Integrated Development EnvironmentsProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00107(380-384)Online publication date: 14-May-2023
  • (2023)Building an open-source system test generation tool: lessons learned and empirical analyses with EvoMasterSoftware Quality Journal10.1007/s11219-023-09620-w31:3(947-990)Online publication date: 6-Mar-2023
  • (2022)Conditional Quantitative Program AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2020.301677848:4(1212-1227)Online publication date: 1-Apr-2022
  • (2021)Input Test Suites for Program Repair: A Novel Construction Method Based on Metamorphic RelationsIEEE Transactions on Reliability10.1109/TR.2020.300331370:1(285-303)Online publication date: Mar-2021
  • (2021)Fuzzing: Challenges and ReflectionsIEEE Software10.1109/MS.2020.301677338:3(79-86)Online publication date: May-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media