Article

Mining specifications

Authors:

Rastislav Bodík,

James R. LarusAuthors Info & Claims

POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages

Pages 4 - 16

https://doi.org/10.1145/503272.503275

Published: 01 January 2002 Publication History

Abstract

Program verification is a promising approach to improving program quality, because it can search all possible program executions for specific errors. However, the need to formally describe correct behavior or errors is a major barrier to the widespread adoption of program verification, since programmers historically have been reluctant to write formal specifications. Automating the process of formulating specifications would remove a barrier to program verification and enhance its practicality.This paper describes specification mining, a machine learning approach to discovering formal specifications of the protocols that code must obey when interacting with an application program interface or abstract data type. Starting from the assumption that a working program is well enough debugged to reveal strong hints of correct protocols, our tool infers a specification by observing program execution and concisely summarizing the frequent interaction patterns as state machines that capture both temporal and data dependences. These state machines can be examined by a programmer, to refine the specification and identify errors, and can be utilized by automatic verification tools, to find bugs.Our preliminary experience with the mining tool has been promising. We were able to learn specifications that not only captured the correct protocol, but also discovered serious bugs.

References

[1]

Thomas Ball, Rupak Majumdar, Todd Millstein, and Sriram K. Rajamani. Automatic predicate abstraction of C programs. In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, volume 36 of ACM SIGPLAN Notices, pages 203-213, July 2001.]]

Digital Library

[2]

Thomas Ball and Sriram K. Rajamani. Automatically validating temporal safety properties of interfaces. In Proceedings of the 8th International SPIN Workshop on Model Checking of Software, number 2057 in Lecture Notes in Computer Science, pages 103-122, May 2001.]]

Digital Library

[3]

Thomas Ball and Sriram K. Rajamani. Bebop: a path-sensitive interprocedural dataflow engine. In Proceedings of the 2001 ACM SIGPLAN-SOGSOFT Workshop on Program Analysis for Software Tools and Engineering, ACM SIGPLAN Notices, pages 97-103, July 2001.]]

Digital Library

[4]

A. W. Biermann and J. A. Feldman. On the synthesis of finite-state machines from samples of their behaviour. IEEE Transactions on Computers, 21:591-597, 1972.]]

[5]

William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A static analyzer for finding dynamic programming errors. Software Practice and Experience, 30:775-802, 2000.]]

Digital Library

[6]

Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. An empirical study of operating systems errors. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP18), pages 73-88, October 2001.]]

Digital Library

[7]

Douglas E. Comer and David L. Stevens. Internetworking with TCP/IP. Client-server Programming and Applications, BSD Socket Version. Prentice-Hall, Englewood Cliffs, NJ 07632, USA, 1993.]]

Digital Library

[8]

Jonathan E. Cook and Alexander L. Wolf. Discovering models of software processes from event-based data. ACM Transactions on Software Engineering and Methodology, 7(3):215-249, July 1998.]]

Digital Library

[9]

Robert DeLine and Manuel Fahndrich. Enforcing high-level protocols in low-level software. In Proceedings of the SIGPLAN '01 Conference on Programming Language Design and Implementation (PLDI), pages 59-69, June 2001.]]

Digital Library

[10]

Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as deviant behavior: a general approach to inferring errors in system code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP18), pages 57-72, October 2001.]]

Digital Library

[11]

Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions in Software Engineering, 27(2):1-25, February 2001.]]

Digital Library

[12]

Cormac Flanagan and K. Rustan M. Leino. Houdini, an annotation assistant for ESC/java. In International Symposium on FME 2001: Formal Methods for Increasing Software Productivity, LNCS, volume 1, 2001.]]

Digital Library

[13]

Anup K. Ghosh, Christoph Michael, and Michael Shatz. A real-time intrusion detection system based on learning program behavior. In RAID 2000, volume 1907 of Lecture Notes in Computer Science, pages 93-109, 2000.]]

Digital Library

[14]

E Mark Gold. Language identification in the limit. Information and Control, 10:447-474, 1967.]]

[15]

Michel Gondran and Michel Minoux. Graphs and Algorithms. John Wiley and Sons, 1984.]]

Digital Library

[16]

Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert E. Schapire, and Linda Sellie. On the learnability of discrete distributions. In Proceedings of the Twenty-sixth ACM Symposium on Theory of Computing, pages 273-282, 1994.]]

Digital Library

[17]

James R. Larus and Eric Schnarr. EEL: Machine-independent executable editing. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation (PLDI), pages 291-300, June 1995.]]

Digital Library

[18]

Christoph Michael and Anup Ghosh. Using finite automata to mine execution data for intrusion detection: a preliminary report. In RAID 2000, volume 1907 of Lecture Notes in Computer Science, pages 66-79, 2000.]]

Digital Library

[19]

William G. Griswold Michael D. Ernst, Adam Czeisler and David Notkin. Quickly detecting relevant program invariants. In Proceedings of the 22nd International Conference on Software Engineering, June 2000.]]

Digital Library

[20]

Kevin P. Murphy. Passively learning finite automata. Technical Report 96-04-017, Santa Fe Institute, 1996.]]

[21]

Anand Raman, Peter Andreae, and Jon Patrick. A beam search algorithm for pfsa inference. Pattern Analysis and Applications, 1(2), 1998.]]

[22]

Anand V. Raman and Jon D. Patrick. The sk-strings method for inferring PFSA. In Proceedings of the workshop on automata induction, grammatical inference and language acquisition at the 14th international conference on machine learning (ICML97), 1997.]]

[23]

S. P. Reiss and M. Renieris. Encoding program executions. In Proceedings of the 23rd International Conference on Software Engeneering (ICSE-01), pages 221-232, Los Alamitos, California, May12-19 2001. IEEE Computer Society.]]

Digital Library

[24]

Dana Ron, Yoram Singer, and Naftali Tishby. On the learnability and usage of acyclic probabilistic finite automata. In Proceedings of the 8th Annual Conference on Computational Learning Theory, pages 31-40. ACM Press, New York, NY, 1995.]]

Digital Library

[25]

David Rosenthal. Inter-client communication conventions manual (ICCCM), version 2.0. X Consortium, Inc. and Sun Microsystems, 1994. Part of the X11R6 distribution.]]

[26]

Robert Endre Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the ACM, 22(2):215-225, 1975.]]

Digital Library

[27]

David Wagner and Drew Dean. Intrusion detection via static analysis. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, May 2001.]]

Digital Library

Cited By

Nielebock SBlockhaus PKruger JOrtmeier FHuyen PTan SMechtaev SKhurshid S(2024)ASAP-Repair: API-Specific Automated Program Repair Based on API Usage GraphsProceedings of the 5th ACM/IEEE International Workshop on Automated Program Repair10.1145/3643788.3648011(1-4)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643788.3648011
Blair WAraujo FTaylor TJang J(2024)Automated Synthesis of Effect Graph Policies for Microservice-Aware Stateful System Call Specialization2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00064(4554-4572)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00064
Bordais BNeider DRoy R(2024)Learning Branching-Time Properties in CTL and ATL via Constraint SolvingFormal Methods10.1007/978-3-031-71162-6_16(304-323)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71162-6_16
Show More Cited By

Recommendations

Mining specifications

Program verification is a promising approach to improving program quality, because it can search all possible program executions for specific errors. However, the need to formally describe correct behavior or errors is a major barrier to the widespread ...
Formalized structured analysis specifications
Protective Interface Specifications
Abstract.
The interface specification of a procedure describes the procedure's behaviour using pre- and postconditions. These pre- and postconditions are written using various functions. If some of these functions are partial, or underspecified, then the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages

January 2002

351 pages

ISBN:1581134509

DOI:10.1145/503272

Conference Chair:
John Launchbury
OGI and Galois Connections
,
Program Chair:
John C. Mitchell
Stanford University

ACM SIGPLAN Notices Volume 37, Issue 1
Jan. 2002
342 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/565816
Issue’s Table of Contents

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

POPL02

Sponsor:

POPL02: The 29th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 2002

January 16 - 18, 2002

Oregon, Portland

Acceptance Rates

POPL '02 Paper Acceptance Rate 28 of 128 submissions, 22%;

Overall Acceptance Rate 824 of 4,130 submissions, 20%

Upcoming Conference

POPL '25

Sponsor:
sigplan

The 52nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages

January 19 - 25, 2025

Denver , CO , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

580
Total Citations
View Citations
2,861
Total Downloads

Downloads (Last 12 months)171
Downloads (Last 6 weeks)24

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nielebock SBlockhaus PKruger JOrtmeier FHuyen PTan SMechtaev SKhurshid S(2024)ASAP-Repair: API-Specific Automated Program Repair Based on API Usage GraphsProceedings of the 5th ACM/IEEE International Workshop on Automated Program Repair10.1145/3643788.3648011(1-4)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643788.3648011
Blair WAraujo FTaylor TJang J(2024)Automated Synthesis of Effect Graph Policies for Microservice-Aware Stateful System Call Specialization2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00064(4554-4572)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00064
Bordais BNeider DRoy R(2024)Learning Branching-Time Properties in CTL and ATL via Constraint SolvingFormal Methods10.1007/978-3-031-71162-6_16(304-323)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71162-6_16
Valizadeh MFijalkow NBerger M(2024)LTL Learning on GPUsComputer Aided Verification10.1007/978-3-031-65633-0_10(209-231)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-65633-0_10
Dierl SFiterau-Brostean PHowar FJonsson BSagonas KTåquist F(2024)Scalable Tree-based Register Automata LearningTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-57249-4_5(87-108)Online publication date: 6-Apr-2024
https://dl.acm.org/doi/10.1007/978-3-031-57249-4_5
Hecking-Harbusch JQuante JSchlund M(2024)Formal Runtime Error Detection During Development in the Automotive IndustryVerification, Model Checking, and Abstract Interpretation10.1007/978-3-031-50524-9_1(3-26)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-50524-9_1
Raha RRoy RFijalkow NNeider DPérez G(2024)Synthesizing Efficiently Monitorable Formulas in Metric Temporal LogicVerification, Model Checking, and Abstract Interpretation10.1007/978-3-031-50521-8_13(264-288)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-50521-8_13
Zhang WPanda AShenker SBaumann ACrooks NSchwarzkopf M(2023)Access Control for Database Applications: Beyond Policy EnforcementProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595905(223-230)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595905
Wang YLi ZJiang CQiu XRao S(2023)Comparative Synthesis: Learning Near-Optimal Network Designs by QueryProceedings of the ACM on Programming Languages10.1145/35711977:POPL(91-120)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3571197
Yandrapally RSinha STzoref-Brill RMesbah AGrundy JPollock LPenta M(2023)Carving UI Tests to Generate API Tests and API SpecificationProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00167(1971-1982)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00167
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents