research-article

Automatic mining of specifications from invocation traces and method invariants

Authors:

Nenad MedvidovicAuthors Info & Claims

FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Pages 178 - 189

https://doi.org/10.1145/2635868.2635890

Published: 11 November 2014 Publication History

Abstract

Software library documentation often describes individual methods' APIs, but not the intended protocols and method interactions. This can lead to library misuse, and restrict runtime detection of protocol violations and automated verification of software that uses the library. Specification mining, if accurate, can help mitigate these issues, which has led to significant research into new model-inference techniques that produce FSM-based models from program invariants and execution traces. However, there is currently a lack of empirical studies that, in a principled way, measure the impact of the inference strategies on model quality. To this end, we identify four such strategies and systematically study the quality of the models they produce for nine off-the-shelf libraries. We find that (1) using invariants to infer an initial model significantly improves model quality, increasing precision by 4% and recall by 41%, on average; (2) effective invariant filtering is crucial for quality and scalability of strategies that use invariants; and (3) using traces in combination with invariants greatly improves robustness to input noise. We present our empirical evaluation, implement new and extend existing model-inference techniques, and make public our implementations, ground-truth models, and experimental data. Our work can lead to higher-quality model inference, and directly improve the techniques and tools that rely on model inference.

References

[1]

N. Beckman, D. Kim, and J. Aldrich. An empirical study of object protocols in the wild. In the European Conference on Object-Oriented Programming (ECOOP), 2011.

Digital Library

[2]

I. Beschastnikh, Y. Brun, J. Abrahamson, M. D. Ernst, and A. Krishnamurthy. Unifying FSM-inference algorithms through declarative specification. In the International Conference on Software Engineering (ICSE), 2013.

Digital Library

[3]

I. Beschastnikh, Y. Brun, M. D. Ernst, and A. Krishnamurthy. Inferring Models of Concurrent Systems from Logs of their Behavior with CSight. In the International Conference on Software Engineering (ICSE), 2014.

Digital Library

[4]

I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In the Joint Meeting of European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), 2011.

Digital Library

[5]

A. Biermann and J. Feldman. On the synthesis of finite-state machines from samples of their behavior. IEEE Transactions on Computers, 21(6), 1972.

Digital Library

[6]

M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In the Joint Meeting of European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), 2009.

Digital Library

[7]

R. P. Buse and W. Weimer. Synthesizing API usage examples. In the International Conference on Software Engineering (ICSE), 2012.

Digital Library

[8]

E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided Abstraction Refinement. In Computer Aided Verification, pages 154–169, 2000.

Digital Library

[9]

Columba e-mail client. http://sourceforge.net/ projects/columba, 2013.

[10]

J. Cook and A. Wolf. Discovering models of software processes from event-based data. ACM Transactions on Software Engineering and Methodology, 7(3), 1998.

Digital Library

[11]

C. Csallner, N. Tillmann, and Y. Smaragdakis. DySy: Dynamic symbolic execution for invariant inference. In the International Conference on Software Engineering (ICSE), 2008.

Digital Library

[12]

DaCapo benchmark. http://www.dacapobench.org, 2009.

[13]

B. Dagenais and M. Robillard. Creating and evolving developer documentation: understanding the decisions of open source contributors. In the Symposium on Foundations of Software Engineering (FSE), 2010.

Digital Library

[14]

B. Dagenais and M. Robillard. Recovering traceability links between an API and its learning resources. In the International Conference on Software Engineering (ICSE), 2012.

Digital Library

[15]

The Daikon invariant detector. http://groups.csail.mit. edu/pag/daikon, 2009.

[16]

V. Dallmeier, N. Knopp, C. Mallon, G. Fraser, S. Hack, and A. Zeller. Automatically generating test cases for specification mining. IEEE Transactions on Software Engineering, 38(2), 2012.

Digital Library

[17]

V. Dallmeier, C. Lindig, A. Wasylkowski, and A. Zeller. Mining object behavior with ADABU. In the Workshop on Dynamic Analysis (WODA), 2006.

Digital Library

[18]

G. de Caso, V. Braberman, D. Garbervetsky, and S. Uchitel. Automated abstractions for contract validation. IEEE Transactions on Software Engineering, 38(1), 2012.

Digital Library

[19]

G. de Caso, V. Braberman, D. Garbervetsky, and S. Uchitel. Enabledness-based program abstractions for behavior validation. ACM Transactions on Software Engineering and Methodology, 22(3), 2013.

Digital Library

[20]

M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69(1), 2007.

Digital Library

[21]

M. Gabel and Z. Su. Javert: Fully automatic mining of general temporal properties from dynamic traces. In the Symposium on Foundations of Software Engineering (FSE), 2008.

Digital Library

[22]

M. Gabel and Z. Su. Online inference and enforcement of temporal properties. In the International Conference on Software Engineering (ICSE), 2010.

Digital Library

[23]

D. Garlan, R. Allen, and J. Ockerbloom. Architectural mismatch: Why reuse is still so hard. IEEE Software, 26(4), 2009.

Digital Library

[24]

C. Ghezzi, M. Pezzè, M. Sama, and G. Tamburrelli. Mining Behavior Models from User-intensive Web Applications. In the International Conference on Software Engineering (ICSE), 2014.

Digital Library

[25]

JarInstaller. http://sourceforge.net/projects/ kurumix, 2013.

[26]

jEdit. http://www.jedit.org, 2014.

[27]

JFtp client. http://j-ftp.sourceforge.net, 2013.

[28]

jlGUI. http://www.javazoom.net/jlgui/jlgui.html, 2010.

[29]

I. Krka, Y. Brun, G. Edwards, and N. Medvidovic. Synthesizing partial component-level behavior models from system specifications. In the Joint Meeting of European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), 2009.

Digital Library

[30]

I. Krka, Y. Brun, and N. Medvidovic. Automatically mining specifications from invocation traces and method invariants. Technical Report CSSE-2013-509, Center for Systems and Software Engineering, University of Southern California, 2013.

[31]

I. Krka, Y. Brun, D. Popescu, J. Garcia, and N. Medvidovic. Using dynamic execution traces and program invariants to enhance behavioral model inference. In the International Conference on Software Engineering New Ideas and Emerging Results Track (ICSE NIER), 2010.

Digital Library

[32]

S. Kumar, S.-C. Khoo, A. Roychoudhury, and D. Lo. Inferring class level specifications for distributed systems. In the International Conference on Software Engineering (ICSE), 2012.

Digital Library

[33]

K. G. Larsen and B. Thomsen. A modal process logic. Logic in Computer Science, 1988.

[34]

C. Lee, F. Chen, and G. Ro¸su. Mining parametric specifications. In the International Conference on Software Engineering (ICSE), 2011.

Digital Library

[35]

K. Li, C. Reichenbach, Y. Smaragdakis, and M. Young. Second-order constraints in dynamic invariant inference. In the Joint Meeting of European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), 2013.

Digital Library

[36]

D. Lo and S. Khoo. QUARK: Empirical assessment of automaton-based specification miners. In the Working Conference on Reverse Engineering (WCRE), 2006.

Digital Library

[37]

D. Lo and S. Khoo. SMArTIC: Towards building an accurate, robust and scalable specification miner. In the Symposium on Foundations of Software Engineering (FSE), 2006.

Digital Library

[38]

D. Lo and S. Maoz. Scenario-based and value-based specification mining: Better together. In the International Conference on Automated Software Engineering (ICSE), 2010.

Digital Library

[39]

D. Lo, L. Mariani, and M. Pezzè. Automatic steering of behavioral model inference. In the Joint Meeting of European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), 2009.

Digital Library

[40]

D. Lo, L. Mariani, and M. Santoro. Learning extended fsa from software: An empirical assessment. Journal of Systems and Software, 85(9), 2012.

Digital Library

[41]

D. Lorenzoli, L. Mariani, and M. Pezzè. Automatic generation of software behavioral models. In the International Conference on Software Engineering (ICSE), 2008.

Digital Library

[42]

K. Mu¸slu, Y. Brun, R. Holmes, M. D. Ernst, and D. Notkin. Speculative analysis of integrated development environment recommendations. In the Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), 2012.

Digital Library

[43]

T. Ohmann, M. Herzberg, S. Fiss, A. Halbert, M. Palyart, I. Beschastnikh, and Y. Brun. Behavioral Resource-Aware Model Inference. In International Conference On Automated Software Engineering (ASE), Västerås, Sweden, 2014.

Digital Library

[44]

T. Ohmann, K. Thai, I. Beschastnikh, and Y. Brun. Mining Precise Performance-Aware Behavioral Models from Existing Instrumentation. In the International Conference on Software Engineering New Ideas and Emerging Results (ICSE NIER) track, 2014.

Digital Library

[45]

J. H. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, M. Carbin, C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, et al. Automatically patching errors in deployed software. In the Symposium on Operating Systems Principles (SOSP), 2009.

Digital Library

[46]

N. Polikarpova, I. Ciupa, and B. Meyer. A comparative study of programmer-written and automatically inferred contracts. In the International Symposium on Software Testing and Analysis (ISSTA), 2009.

Digital Library

[47]

M. Pradel, P. Bichsel, and T. R. Gross. A framework for the evaluation of specification miners based on finite state machines. In the International Conference on Software Maintenance (ICSM), 2010.

Digital Library

[48]

M. Pradel and T. R. Gross. Leveraging test generation and specification mining for automated bug detection without false positives. In the International Conference on Software Engineering (ICSE), 2012.

Digital Library

[49]

S. P. Reiss and M. Renieris. Encoding program executions. In the International Conference on Software Engineering (ICSE), 2001.

Digital Library

[50]

R. Robbes and M. Lanza. How program history can improve code completion. In the International Conference on Automated Software Engineering (ASE), 2008.

Digital Library

[51]

M. Robillard. What makes APIs hard to learn? Answers from developers. IEEE Software, 26(6), 2009.

Digital Library

[52]

M. Schur, A. Roth, and A. Zeller. Mining behavior models from enterprise web applications. In the Joint Meeting of European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), 2013.

Digital Library

[53]

S. Shoham, E. Yahav, S. J. Fink, and M. Pistoia. Static Specification Mining Using Automata-Based Abstractions. IEEE Transactions on Software Engineering, 34(5), 2008.

Digital Library

[54]

R. N. Taylor, N. Medvidovic, and E. M. Dashofy. Software Architecture: Foundations, Theory, and Practice. John Wiley & Sons, 2009.

Digital Library

[55]

Project Voldemort. http://www.project-voldemort.com, 2014.

[56]

N. Walkinshaw and K. Bogdanov. Inferring finite-state models with temporal constraints. In the International Conference on Automated Software Engineering (ASE), 2008.

Digital Library

[57]

Y. Wei, C. A. Furia, N. Kazmin, and B. Meyer. Inferring better contracts. In the International Conference on Software Engineering (ICSE), 2011.

Digital Library

[58]

J. Whaley, M. C. Martin, and M. S. Lam. Automatic extraction of object-oriented component interfaces. In the International Symposium on Software Testing and Analysis (ISSTA), 2002.

Digital Library

[59]

T. Xie et al. Data mining for software engineering. Computer, 42(8), 2009.

Digital Library

[60]

J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: Mining temporal API rules from imperfect traces. In the International Conference on Software Engineering, 2006.

Digital Library

[61]

Yices SMT Solver. http://yices.csl.sri.com, 2009.

Cited By

Fan YWang M(2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
https://doi.org/10.3390/a17010028
Qiu YKon PBeckett RChen AWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)Unearthing Semantic Checks for Cloud Infrastructure-as-Code ProgramsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695974(574-589)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695974
Liao YXu MLin YTeoh XXie XFeng RLiaw FZhang HDong JFilkov VRay BZhou M(2024)Detecting and Explaining Anomalies Caused by Web Tamper Attacks via Building Consistency-based NormalityProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695024(531-543)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695024
Show More Cited By

Index Terms

Automatic mining of specifications from invocation traces and method invariants
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Dynamically Discovering Likely Program Invariants to Support Program Evolution
Special issue on 1999 international conference on software engineering

Explicitly stated program invariants can help programmers by identifying program properties that must be preserved when modifying code. In practice, however, these invariants are usually implicit. An alternative to expecting programmers to fully ...
Mining precise performance-aware behavioral models from existing instrumentation
ICSE Companion 2014: Companion Proceedings of the 36th International Conference on Software Engineering

Software bugs often arise from differences between what developers envision their system does and what that system actually does. When faced with such conceptual inconsistencies, debugging can be very difficult. Inferring and presenting developers with ...
Inferring Software Behavioral Models with MapReduce
SETTA 2015: Proceedings of the First International Symposium on Dependable Software Engineering: Theories, Tools, and Applications - Volume 9409

Software systems are often built without developing any explicit model and therefore research has been focusing on automatic inference of models by applying machine learning to execution logs. However, the logs generated by a real software system may be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

November 2014

856 pages

ISBN:9781450330565

DOI:10.1145/2635868

General Chair:
Shing-Chi Cheung
Hong Kong University of Science and Technology, China
,
Program Chairs:
Alessandro Orso
Georgia Institute of Technology, USA
,
Margaret-Anne Storey
University of Victoria, Canada

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGSOFT/FSE'14

Sponsor:

SIGSOFT

SIGSOFT/FSE'14: 22nd ACM SIGSOFT Symposium on the Foundations of Software Engineering

November 16 - 21, 2014

Hong Kong, China

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
535
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)4

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fan YWang M(2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
https://doi.org/10.3390/a17010028
Qiu YKon PBeckett RChen AWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)Unearthing Semantic Checks for Cloud Infrastructure-as-Code ProgramsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695974(574-589)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695974
Liao YXu MLin YTeoh XXie XFeng RLiaw FZhang HDong JFilkov VRay BZhou M(2024)Detecting and Explaining Anomalies Caused by Web Tamper Attacks via Building Consistency-based NormalityProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695024(531-543)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695024
Clun DShin DFilieri ABianculli D(2024)Rigorous Assessment of Model Inference Accuracy using Language CardinalityACM Transactions on Software Engineering and Methodology10.1145/364033233:4(1-39)Online publication date: 16-Jan-2024
https://dl.acm.org/doi/10.1145/3640332
Dürschmid TTimperley CGarlan DLe Goues CRoychoudhury APaiva AAbreu RStorey M(2024)ROSInfer: Statically Inferring Behavioral Component Models for ROS-based Robotics SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639206(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639206
Huang JNie JGong YYou WLiang BBian PRoychoudhury APaiva AAbreu RStorey M(2024)Raisin: Identifying Rare Sensitive Functions for Bug DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639165(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639165
Jiang JWu JLing XLuo TQu SWu Y(2024)APP-Miner: Detecting API Misuses via Automatically Mining API Path Patterns2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00043(4034-4052)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00043
Brun YLin TSomerville JMyers EEbner N(2023)Blindspots in Python and Java APIs Result in Vulnerable CodeACM Transactions on Software Engineering and Methodology10.1145/357185032:3(1-31)Online publication date: 26-Apr-2023
https://dl.acm.org/doi/10.1145/3571850
Peng BLiang PHan TLuo WDu JWan HYe RZheng YBissyandé TKlein JBird CSarro F(2023)PURLTL: Mining LTL Specification from Imperfect Traces in TestingProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00202(1766-1770)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00202
Yang NCuijpers PHendriks DSchiffelers RLukkien JSerebrenik A(2023)An interview study about the use of logs in embedded software engineeringEmpirical Software Engineering10.1007/s10664-022-10258-828:2Online publication date: 11-Feb-2023
https://dl.acm.org/doi/10.1007/s10664-022-10258-8
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents