research-article

Automatically generating precise Oracles from structured natural language specifications

Authors:

Manish Motwani,

Yuriy BrunAuthors Info & Claims

ICSE '19: Proceedings of the 41st International Conference on Software Engineering

Pages 188 - 199

https://doi.org/10.1109/ICSE.2019.00035

Published: 25 May 2019 Publication History

Abstract

Software specifications often use natural language to describe the desired behavior, but such specifications are difficult to verify automatically. We present Swami, an automated technique that extracts test oracles and generates executable tests from structured natural language specifications. Swami focuses on exceptional behavior and boundary conditions that often cause field failures but that developers often fail to manually write tests for. Evaluated on the official JavaScript specification (ECMA-262), 98.4% of the tests Swami generated were precise to the specification. Using Swami to augment developer-written test suites improved coverage and identified 1 previously unknown defect and 15 missing JavaScript features in Rhino, 1 previously unknown defect in Node.js, and 18 semantic ambiguities in the ECMA-262 specification.

References

[1]

American fuzzy lop. http://lcamtuf.coredump.cx/afl/, 2018.

[2]

Paul Ammann and Jeff Offutt. Introduction to Software Testing. Cambridge University Press, 1 edition, 2008.

Digital Library

[3]

Rico Angell, Brittany Johnson, Yuriy Brun, and Alexandra Meliou. Themis: Automatically testing software for discrimination. In ESEC/FSE Demo, pages 871--875, 2018.

Digital Library

[4]

European Computer Manufacturer's Association. ECMA standards. https://ecma-international.org/publications/standards/Standard.htm, 2018.

[5]

Ivan Beschastnikh, Yuriy Brun, Jenny Abrahamson, Michael D. Ernst, and Arvind Krishnamurthy. Using declarative specification to improve the understanding, extensibility, and comparison of model-inference algorithms. IEEE TSE, 41(4):408--428, April 2015.

[6]

Ivan Beschastnikh, Yuriy Brun, Michael D. Ernst, and Arvind Krishnamurthy. Inferring models of concurrent systems from logs of their behavior with CSight. In ICSE, pages 468--479, 2014.

Digital Library

[7]

Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, and Michael D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In ESEC/FSE, pages 267--277, 2011.

Digital Library

[8]

Arianna Blasi, Alberto Goffi, Konstantin Kuznetsov, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Sergio Delgado Castellanos. Translating code comments to procedure specifications. In ISSTA, pages 242--253, 2018.

Digital Library

[9]

Chandrasekhar Boyapati, Sarfraz Khurshid, and Darko Marinov. Korat: Automated testing based on Java predicates. In ISSTA, pages 123--133, 2002.

Digital Library

[10]

Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, and Vitaly Shmatikov. Using frankencerts for automated adversarial testing of certificate validation in SSL/TLS implementations. In S&P, pages 114--129, 2014.

Digital Library

[11]

Yuriy Brun and Alexandra Meliou. Software fairness. In ESEC/FSE New Ideas and Emerging Results, pages 754--759, 2018.

Digital Library

[12]

Yuting Chen and Zhendong Su. Guided differential testing of certificate validation in SSL/TLS implementations. In ESEC/FSE, pages 793--804, 2015.

Digital Library

[13]

Flaviu Cristian. Exception handling. Technical Report RJ5724, IBM Research, 1987.

[14]

Valentin Dallmeier, Nikolai Knopp, Christoph Mallon, Sebastian Hack, and Andreas Zeller. Generating test cases for specification mining. In ISSTA, pages 85--96, 2010.

Digital Library

[15]

Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. Feature location in source code: A taxonomy and survey. Journal of Software: Evolution and Process, 25(1):53--95, 2013.

[16]

Marc Eaddy. Concern tagger case study data mapping the Rhino source code to the ECMA-262 specification). http://www.cs.columbia.edu/~eaddy/concerntagger/, 2007.

[17]

Marc Eaddy, Alfred V. Aho, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. Cerberus: Tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In ICPC, pages 53--62, 2008.

Digital Library

[18]

Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):99--123, 2001.

Digital Library

[19]

Robert B. Evans and Alberto Savoia. Differential testing: A new approach to change detection. In ESEC/FSE Poster, pages 549--552, 2007.

Digital Library

[20]

Gordon Fraser and Andrea Arcuri. Whole test suite generation. IEEE TSE, 39(2):276--291, February 2013.

Digital Library

[21]

Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou. A characterization of the complexity of resilience and responsibility for self-join-free conjunctive queries. Proceedings of the VLDB Endowment (PVLDB), 9(3):180--191, 2015.

Digital Library

[22]

Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: Testing software for discrimination. In ESEC/FSE, pages 498--510, 2017.

Digital Library

[23]

Carlo Ghezzi, Mauro Pezzè, Michele Sama, and Giordano Tamburrelli. Mining behavior models from user-intensive web applications. In ICSE, pages 277--287, 2014.

Digital Library

[24]

Alberto Goffi, Alessandra Gorla, Michael D. Ernst, and Mauro Pezzè. Automatic generation of oracles for exceptional behaviors. In ISSTA, pages 213--224, 2016.

Digital Library

[25]

Emily Hill. Developing natural language-based program analyses and tools to expedite software maintenance. In ICSE Doctoral Symposium, pages 1015--1018, 2008.

Digital Library

[26]

Emily Hill, Shivani Rao, and Avinash Kak. On the use of stemming for concern location and bug localization in Java. In SCAM, pages 184--193, 2012.

Digital Library

[27]

Daniel Jurafsky and James H. Martin. Speech and Language Processing. Pearson Education, Inc., 2 edition, 2009.

Digital Library

[28]

Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. Repairing programs with semantic code search. In ASE, pages 295--306, 2015.

Digital Library

[29]

Ivo Krka, Yuriy Brun, George Edwards, and Nenad Medvidovic. Synthesizing partial component-level behavior models from system specifications. In ESEC/FSE, pages 305--314, 2009.

Digital Library

[30]

Tien-Duy B. Le, Xuan-Bach D. Le, David Lo, and Ivan Beschastnikh. Synergizing specification miners through model fissions and fusions. In ASE, 2015.

Digital Library

[31]

Tien-Duy B. Le and David Lo. Beyond support and confidence: Exploring interestingness measures for rule-based specification mining. In SANER, 2015.

[32]

Owolabi Legunsen, Wajih Ul Hassan, Xinyue Xu, Grigore Roşu, and Darko Marinov. How good are the specs? A study of the bug-finding effectiveness of existing Java API specifications. In ASE, pages 602--613, 2016.

Digital Library

[33]

David Lo and Siau-Cheng Khoo. QUARK: Empirical assessment of automaton-based specification miners. In WCRE, 2006.

Digital Library

[34]

David Lo and Siau-Cheng Khoo. SMArTIC: Towards building an accurate, robust and scalable specification miner. In FSE, pages 265--275, 2006.

Digital Library

[35]

David Lo and Shahar Maoz. Scenario-based and value-based specification mining: Better together. In ASE, pages 387--396, 2010.

Digital Library

[36]

David Lo, Leonardo Mariani, and Mauro Pezzè. Automatic steering of behavioral model inference. In ESEC/FSE, pages 345--354, 2009.

Digital Library

[37]

Fan Long and Martin Rinard. Staged program repair with condition synthesis. In ESEC/FSE, pages 166--178, 2015.

Digital Library

[38]

Fan Long and Martin Rinard. Automatic patch generation by learning correct code. In POPL, pages 298--312, 2016.

Digital Library

[39]

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL, pages 55--60, 2014.

[40]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. DirectFix: Looking for simple program repairs. In ICSE, pages 448--458, 2015.

Digital Library

[41]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In International Conference on Software Engineering (ICSE), pages 691--701, May 2016.

Digital Library

[42]

Alexandra Meliou, Wolfgang Gatterbauer, Joseph Y. Halpern, Christoph Koch, Katherine F. Moore, and Dan Suciu. Causality in databases. IEEE Data Engineering Bulletin, 33(3):59--67, 2010.

[43]

Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. The complexity of causality and responsibility for query answers and non-answers. Proceedings of the VLDB Endowment (PVLDB), 4(1):34--45, 2010.

Digital Library

[44]

Alexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu. Bringing provenance to its full potential using causal reasoning. In USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2011.

[45]

Alexandra Meliou, Sudeepa Roy, and Dan Suciu. Causality and explanations in databases. Proceedings of the VLDB Endowment (PVLDB) tutorial, 7(13):1715--1716, 2014.

Digital Library

[46]

Kivanç Muşlu, Yuriy Brun, and Alexandra Meliou. Data debugging with continuous testing. In ESEC/FSE New Ideas, pages 631--634, 2013.

Digital Library

[47]

Kivanç Muşlu, Yuriy Brun, and Alexandra Meliou. Preventing data errors with continuous testing. In ISSTA, pages 373--384, 2015.

Digital Library

[48]

Jeremy W. Nimmer and Michael D. Ernst. Automatic generation of program specifications. In ISSTA, 2002.

Digital Library

[49]

Tony Ohmann, Michael Herzberg, Sebastian Fiss, Armand Halbert, Marc Palyart, Ivan Beschastnikh, and Yuriy Brun. Behavioral resource-aware model inference. In ASE, pages 19--30, 2014.

Digital Library

[50]

Carlos Pacheco and Michael D. Ernst. Randoop: Feedback-directed random testing for Java. In OOPSLA, pages 815--816, 2007.

Digital Library

[51]

Denys Poshyvanyk, Malcom Gethers, and Andrian Marcus. Concept location using formal concept analysis and information retrieval. ACM TOSEM, 21(4):23, 2012.

Digital Library

[52]

Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In ISSTA, pages 24--36, 2015.

Digital Library

[53]

Md Masudur Rahman, Saikat Chakraborty, Gail Kaiser, and Baishakhi Ray. A case study on the impact of similarity measure on information retrieval based software engineering tasks. CoRR, abs/1808.02911, 2018.

[54]

Steven P. Reiss and Manos Renieris. Encoding program executions. In ICSE, pages 221--230, 2001.

Digital Library

[55]

Research Triangle Institute. The economic impacts of inadequate infrastructure for software testing. NIST Planning Report 02-3, May 2002.

[56]

Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple BM25 extension to multiple weighted fields. In CIKM, pages 42--49, 2004.

Digital Library

[57]

Stephen E. Robertson, Stephen Walker, and Micheline Beaulieu. Experimentation as a way of life: Okapi at TREC. Information Processing and Management, 36:95--108, January 2000.

Digital Library

[58]

Ripon K. Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E. Perry. Improving bug localization using structured information retrieval. In ASE, pages 345--355, 2013.

Digital Library

[59]

Vipin Samar and Sangeeta Patni. Differential testing for variational analyses: Experience from developing KConfigReader. CoRR, abs/1706.09357, 2017.

[60]

Matthias Schur, Andreas Roth, and Andreas Zeller. Mining behavior models from enterprise web applications. In ESEC/FSE, pages 422--432, 2013.

Digital Library

[61]

Stelios Sidiroglou-Douskos, Eric Lahtinen, Fan Long, and Martin Rinard. Automatic error elimination by horizontal code transfer across multiple applications. In PLDI, pages 43--54, 2015.

Digital Library

[62]

Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun. Is the cure worse than the disease? Overfitting in automated program repair. In ESEC/FSE, pages 532--543, 2015.

Digital Library

[63]

Varun Srivastava, Michael D. Bond, Kathryn S. McKinley, and Vitaly Shmatikov. A security policy oracle: Detecting security holes using multiple API implementations. In PLDI, pages 343--354, 2011.

Digital Library

[64]

Trevor Strohman, Donald Metzler, HowardTurtle, and W. Bruce Croft. Indri: A language model-based search engine for complex queries. In International Conference on Intelligence Analysis, pages 2--6, 2005.

[65]

Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T. Leavens. @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In ICST, pages 260--269, 2012.

Digital Library

[66]

Shin Hwei Tan and Abhik Roychoudhury. relifix: Automated repair of software regressions. In ICSE, pages 471--482, 2015.

Digital Library

[67]

Chakkrit Tantithamthavorn, Surafel Abebe Lemma, Ahmed E. Hassan, Akinori Ihara, and Kenichi Matsumoto. The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization. Information and Software Technology, 2018.

[68]

Robert J. Walls, Yuriy Brun, Marc Liberatore, and Brian Neil Levine. Discovering specification violations in networked software systems. In ISSRE, pages 496--506, 2015.

Digital Library

[69]

Qianqian Wang, Yuriy Brun, and Alessandro Orso. Behavioral execution comparison: Are tests representative of field behavior? In ICST, pages 321--332, 2017.

[70]

Xiaolan Wang, Xin Luna Dong, and Alexandra Meliou. Data X-Ray: A diagnostic tool for data errors. In SIGMOD, pages 1231--1245, 2015.

Digital Library

[71]

Xiaolan Wang, Alexandra Meliou, and Eugene Wu. QFix: Demonstrating error diagnosis in query histories. In SIGMOD Demo, pages 2177--2180, 2016.

Digital Library

[72]

Xiaolan Wang, Alexandra Meliou, and Eugene Wu. QFix: Diagnosing errors through query histories. In SIGMOD, pages 1369--1384, 2017.

Digital Library

[73]

Westley Weimer and George C. Necula. Finding and preventing run-time error handling mistakes. In OOPSLA, pages 419--431, 2004.

Digital Library

[74]

Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. Automatically finding patches using genetic programming. In ICSE, pages 364--374, 2009.

Digital Library

[75]

Aaron Weiss, Arjun Guha, and Yuriy Brun. Tortoise: Interactive system configuration repair. In ASE, pages 625--636, 2017.

Digital Library

[76]

Allen Wirfs-Brock and Brian Terlson. ECMA-262, ECMAScript 2017 language specification, 8th edition. https://www.ecma-international.org/ecma-262/8.0, 2017.

[77]

W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. A survey on software fault localization. IEEE TSE, 42(8):707--740, 2016.

Digital Library

[78]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In PLDI, pages 283--294, 2011.

Digital Library

[79]

Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. From word embeddings to document similarities for improved information retrieval in software engineering. In ICSE, pages 404--415, 2016.

Digital Library

[80]

Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342, 2001.

Digital Library

Cited By

Wang CZhang JWu RZhang C(2024)DAInfer: Inferring API Aliasing Specifications from Library Documentation via Neurosymbolic OptimizationProceedings of the ACM on Software Engineering10.1145/36608161:FSE(2469-2492)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660816
Eladawy HLe Goues CBrun YRoychoudhury APaiva AAbreu RStorey M(2024)Automated Program Repair, What Is It Good For? Not Absolutely Nothing!Proceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639095(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639095
Kim MCorradini DSinha SOrso APasqua MTzoref-Brill RCeccato MJust RFraser G(2023)Enhancing REST API Testing with NLP TechniquesProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598131(1232-1243)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598131
Show More Cited By

Recommendations

Formalized structured analysis specifications
From formal specifications to natural language: a case study
ASE '97: Proceedings of the 12th international conference on Automated software engineering (formerly: KBSE)

Because software specifications often serve as a formal contract between the developer and the customer, systems have been proposed that help the software client better understand specifications by automatically paraphrasing them in natural language. ...
A "Vibration" Method for Automatically Generating Test Cases Based on Formal Specifications
APSEC '11: Proceedings of the 2011 18th Asia-Pacific Software Engineering Conference

Several approaches to test case generation based on formal specifications have been put forward, but how to automatically generate test cases to ensure that all of the representative program paths of the corresponding program are traversed still remains ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '19: Proceedings of the 41st International Conference on Software Engineering

May 2019

1318 pages

General Chair:
Joanne M. Atlee
University of Waterloo, Canada
,
Program Chairs:
Tevfik Bultan
University of California, Santa Barbara
,
Jon Whittle
Monash University, Australia

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Badges

Qualifiers

Research-article

Conference

ICSE '19

Sponsor:

SIGSOFT
IEEE-CS

ICSE '19: 41st International Conference on Software Engineering

May 25 - 31, 2019

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
214
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang CZhang JWu RZhang C(2024)DAInfer: Inferring API Aliasing Specifications from Library Documentation via Neurosymbolic OptimizationProceedings of the ACM on Software Engineering10.1145/36608161:FSE(2469-2492)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660816
Eladawy HLe Goues CBrun YRoychoudhury APaiva AAbreu RStorey M(2024)Automated Program Repair, What Is It Good For? Not Absolutely Nothing!Proceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639095(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639095
Kim MCorradini DSinha SOrso APasqua MTzoref-Brill RCeccato MJust RFraser G(2023)Enhancing REST API Testing with NLP TechniquesProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598131(1232-1243)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598131
Motwani MBrun YGrundy JPollock LPenta M(2023)Better Automatic Program Repair by Using Bug Reports and Tests TogetherProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00109(1225-1237)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00109
Motwani MBrun YLiu AMuccini H(2023)Understanding Why and Predicting When Developers Adhere to Code-Quality StandardsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00045(432-444)Online publication date: 17-May-2023
https://dl.acm.org/doi/10.1109/ICSE-SEIP58684.2023.00045
Agrawal AFirst EKaufman ZReichel TZhang SZhou TSanchez-Stern ARinger TBrun YGrundy J(2023)Proofster: Automated Formal VerificationProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00018(26-30)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE-Companion58688.2023.00018
Rani PBlasi AStulova NPanichella SGorla ANierstrasz O(2023)A decade of code comment quality assessmentJournal of Systems and Software10.1016/j.jss.2022.111515195:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.jss.2022.111515
Brun YCordy MXie XXu BStamatia B(2022)The promise and perils of using machine learning when engineering software (keynote paper)Proceedings of the 6th International Workshop on Machine Learning Techniques for Software Quality Evaluation10.1145/3549034.3570200(1-4)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3549034.3570200
Xie DLi YKim MPham HTan LZhang XGodfrey MRyu SSmaragdakis Y(2022)DocTer: documentation-guided fuzzing for testing deep learning API functionsProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3533767.3534220(176-188)Online publication date: 18-Jul-2022
https://dl.acm.org/doi/10.1145/3533767.3534220
Patra JPradel MDwyer MDamian DZeller A(2022)NalinProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510144(1469-1481)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510144
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents