research-article

Learning-based extraction of first-order logic representations of API directives

Authors:

Andrian Marcus,

Christoph Treude,

Xiaoxin ZhangAuthors Info & Claims

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 491 - 502

https://doi.org/10.1145/3468264.3468618

Published: 18 August 2021 Publication History

Abstract

Developers often rely on API documentation to learn API directives, i.e., constraints and guidelines related to API usage. Failing to follow API directives may cause defects or improper implementations. Since there are no industry-wide standards on how to document API directives, they take many forms and are often hard to understand by developers or challenging to parse with tools.

In this paper, we propose a learning based approach for extracting first-order logic representations of API directives (FOL directives for short). The approach, called LEADFOL, uses a joint learning method to extract atomic formulas by identifying the predicates and arguments involved in directive sentences, and recognizes the logical relations between atomic formulas, by parsing the sentence structures. It then parses the arguments and uses a learning based method to link API references to their corresponding API elements. Finally, it groups the formulas of the same class or method together and transforms them into conjunctive normal form. Our evaluation shows that LEADFOL can accurately extract more FOL directives than a state-of-the-art approach and that the extracted FOL directives are useful in supporting code reviews.

References

[1]

2021. gensim. https://radimrehurek.com/gensim/

[2]

2021. Replication Package. https://fudanselab.github.io/Research-ESEC-FSE2021-APIDirective/

[3]

2021. word2vec-api. https://github.com/3Top/word2vec-api

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). 4171–4186. https://doi.org/10.18653/v1/n19-1423

[5]

Davide Fucci, Alireza Mollaalizadehbahnemiri, and Walid Maalej. 2019. On using machine learning to identify knowledge in API reference documentation. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019. 109–119. https://doi.org/10.1145/3338906.3338943

Digital Library

[6]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM, 60, 6 (2017), 84–90.

Digital Library

[7]

Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph. In 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018. 183–193. https://doi.org/10.1109/ICSME.2018.00028

[8]

Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019. 120–130. https://doi.org/10.1145/3338906.3338971

Digital Library

[9]

Walid Maalej and Martin P. Robillard. 2013. Patterns of Knowledge in API Reference Documentation. IEEE Trans. Software Eng., 39, 9 (2013), 1264–1282. https://doi.org/10.1109/TSE.2013.12

Digital Library

[10]

Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica, 22, 3 (2012), 276–282.

[11]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111–3119.

Digital Library

[12]

Frederic P Miller, Agnes F Vandome, and John McBrewster. 2009. Levenshtein distance: Information theory, computer science, string (computer science), string metric, damerau? Levenshtein distance, spell checker, hamming distance.

[13]

Martin Monperrus, Michael Eichberg, Elif Tekes, and Mira Mezini. 2012. What should developers be aware of? An empirical study on the directives of API documentation. Empirical Software Engineering, 17, 6 (2012), 703–737. https://doi.org/10.1007/s10664-011-9186-4

Digital Library

[14]

Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano: Text Annotation Tool for Human. https://github.com/doccano/doccano Software available from https://github.com/doccano/doccano.

[15]

Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists. 1, 380–384.

[16]

Mohamed A Oumaziz, Alan Charpentier, Jean-Rémy Falleri, and Xavier Blanc. 2017. Documentation reuse: Hot or not? An empirical study. In International Conference on Software Reuse. 12–27. https://doi.org/10.1007/978-3-319-56856-0_2

[17]

Rahul Pandita, Kunal Taneja, Laurie Williams, and Teresa Tung. 2016. ICON: Inferring Temporal Constraints from Natural Language API Descriptions. In 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, Raleigh, NC, USA, October 2-7, 2016. IEEE Computer Society, 378–388. https://doi.org/10.1109/ICSME.2016.59

[18]

Lev-Arie Ratinov and Dan Roth. 2009. Design Challenges and Misconceptions in Named Entity Recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, Boulder, Colorado, USA, June 4-5, 2009. 147–155.

Digital Library

[19]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980–3990. https://doi.org/10.18653/v1/D19-1410

[20]

Xiaoxue Ren, Xinyuan Ye, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Jianling Sun. 2020. API-Misuse Detection Driven by Fine-Grained API-Constraint Knowledge Graph. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, 461–472. https://doi.org/10.1145/3324884.3416551

Digital Library

[21]

Martin P. Robillard and Robert DeLine. 2011. A field study of API learning obstacles. Empir. Softw. Eng., 16, 6 (2011), 703–732.

Digital Library

[22]

Lin Shi, Hao Zhong, Tao Xie, and Mingshu Li. 2011. An empirical study on evolution of API documentation. In International Conference on Fundamental Approaches To Software Engineering. 416–431. https://doi.org/10.1007/978-3-642-19811-3_29

[23]

Ravindra Singh and Naurang Singh Mangat. 2013. Elements of survey sampling. 15, Springer Science & Business Media.

[24]

Amann Sven, Hoan Anh Nguyen, Sarah Nadi, Tien N Nguyen, and Mira Mezini. 2019. Investigating next steps in static API-misuse detection. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 265–275. https://doi.org/10.1109/MSR.2019.00053

Digital Library

[25]

Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T. Leavens. 2012. @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. In Fifth IEEE International Conference on Software Testing, Verification and Validation, ICST 2012, Montreal, QC, Canada, April 17-21, 2012, Giuliano Antoniol, Antonia Bertolino, and Yvan Labiche (Eds.). IEEE Computer Society, 260–269. https://doi.org/10.1109/ICST.2012.106

Digital Library

[26]

Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. In ASE 2009, 24th IEEE/ACM International Conference on Automated Software Engineering, Auckland, New Zealand, November 16-20, 2009. IEEE Computer Society, 283–294. https://doi.org/10.1109/ASE.2009.72

Digital Library

[27]

Chong Wang, Xin Peng, Mingwei Liu, Zhenchang Xing, Xuefang Bai, Bing Xie, and Tuo Wang. [n.d.]. A learning-based approach for automatic construction of domain glossary from source code and documentation. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). 97–108. https://doi.org/10.1145/3338906.3338963

Digital Library

[28]

Jason W. Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 6381–6387. https://doi.org/10.18653/v1/D19-1670

[29]

Bernard L Welch. 1947. The generalization of Student’s problem when several different population variances are involved. Biometrika, 34, 1/2 (1947), 28–35. https://ci.nii.ac.jp/naid/10026469617/en/

[30]

Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, and Bo Xu. [n.d.]. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. 1227–1236. https://doi.org/10.18653/v1/P17-1113

[31]

Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. 2009. Inferring Resource Specifications from Natural Language API Documentation. In ASE 2009, 24th IEEE/ACM International Conference on Automated Software Engineering, Auckland, New Zealand, November 16-20, 2009. IEEE Computer Society, 307–318. https://doi.org/10.1109/ASE.2009.94

Digital Library

[32]

Yu Zhou, Changzhi Wang, Xin Yan, Taolue Chen, Sebastiano Panichella, and Harald C Gall. 2018. Automatic detection and repair recommendation of directive defects in Java API documentation. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2018.2872971

Cited By

Ma YTian WGao XSun HLi LChristakis MPradel M(2024)API Misuse Detection via Probabilistic Graphical ModelProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652112(88-99)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652112
Guevara-Vega CBernárdez BCruz MDurán ARuiz-Cortés ASolari M(2024)Research artifacts for human-oriented experiments in software engineeringJournal of Systems and Software10.1016/j.jss.2024.112187218:COnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112187
Du XLou YLiu MPeng XYang TChandra SBlincoe KTonella P(2023)KG4CraSolver: Recommending Crash Solutions via Knowledge GraphProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616317(1242-1254)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616317
Show More Cited By

Index Terms

Learning-based extraction of first-order logic representations of API directives
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation
      2. Maintaining software
  2. Software notations and tools
    1. Software libraries and repositories

Recommendations

Proving Isomorphism of First-Order Logic Proof Systems in HOL
Extension Rule in First Order Logic
ICCI '06: Proceedings of the 2006 5th IEEE International Conference on Cognitive Informatics - Volume 02

The extension rule (ER) is a new method for theorem proving, which is potentially a complementary method to resolution-based methods. But the first-order ER method is incomplete. We improve the first-order ER approach by (a) revise the definition of the ...
First-Order Modal Logic: Frame Definability and a Lindström Theorem

We generalize two well-known model-theoretic characterization theorems from propositional modal logic to first-order modal logic (FML, for short). We first study FML-definable frames and give a version of the Goldblatt---Thomason theorem for this logic. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 2021

1690 pages

ISBN:9781450385626

DOI:10.1145/3468264

General Chairs:
Diomidis Spinellis
Athens University of Economics and Business, Greece
,
Georgios Gousios
Facebook, Netherlands / Delft University of Technology, Netherlands
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Massimiliano Di Penta
University of Sannio, Italy

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '21

Sponsor:

SIGSOFT

ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 23 - 28, 2021

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
264
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)3

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma YTian WGao XSun HLi LChristakis MPradel M(2024)API Misuse Detection via Probabilistic Graphical ModelProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652112(88-99)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652112
Guevara-Vega CBernárdez BCruz MDurán ARuiz-Cortés ASolari M(2024)Research artifacts for human-oriented experiments in software engineeringJournal of Systems and Software10.1016/j.jss.2024.112187218:COnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112187
Du XLou YLiu MPeng XYang TChandra SBlincoe KTonella P(2023)KG4CraSolver: Recommending Crash Solutions via Knowledge GraphProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616317(1242-1254)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616317
Badampudi DUnterkalmsteiner MBritto R(2023)Modern Code Reviews—Survey of Literature and PracticeACM Transactions on Software Engineering and Methodology10.1145/358500432:4(1-61)Online publication date: 26-May-2023
https://dl.acm.org/doi/10.1145/3585004

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten