research-article

API Misuse Detection via Probabilistic Graphical Model

Authors:

Li LiAuthors Info & Claims

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 88 - 99

https://doi.org/10.1145/3650212.3652112

Published: 11 September 2024 Publication History

Abstract

API misuses can cause a range of issues in software development, including program crashes, bugs, and vulnerabilities. Different approaches have been developed to automatically detect API misuses by checking the program against usage rules extracted from extensive codebase or API documents. However, these mined rules may not be precise or complete, leading to high false positive/negative rates. In this paper, we propose a novel solution to this problem by representing the mined API usage rules as a probabilistic graphical model, where each rule's probability value represents its trustworthiness of being correct. Our approach automatically constructs probabilistic usage rules by mining codebase and documents, and aggregating knowledge from different sources. Here, the usage rules obtained from the codebase initialize the probabilistic model, while the knowledge from the documents serves as a supplement for adjusting and complementing the probabilities accordingly. We evaluate our approach on the MuBench benchmark. Experimental results show that our approach achieves 42.0% precision and 54.5% recall, significantly outperforming state-of-the-art approaches.

References

[1]

n.d. httpclient. http://svn.apache.org/repos/asf/jakarta/commons/proper/httpclient/trunk/

[2]

Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You get where you’re looking for: The impact of information sources on code security. In 2016 IEEE Symposium on Security and Privacy (SP). 289–305.

[3]

Sven Amann, Sarah Nadi, Hoan A Nguyen, Tien N Nguyen, and Mira Mezini. 2016. MUBench: A benchmark for API-misuse detectors. In Proceedings of the 13th international conference on mining software repositories. 464–467.

Digital Library

[4]

asf. n.d. jackrabbit. http://svn.apache.org/repos/asf/jackrabbit/trunk/

[5]

Pavel Avgustinov, Oege De Moor, Michael Peyton Jones, and Max Schäfer. 2016. QL: Object-oriented queries on relational data. In 30th European Conference on Object-Oriented Programming (ECOOP 2016).

[6]

Cédric Beust. n.d. TestNG. https://github.com/cbeust/testng

[7]

Robert J Ellison, John B Goodenough, Charles B Weinstock, and Carol Woody. 2010. Evaluating and mitigating software supply chain security risks. Software Engineering Institute, Tech. Rep. CMU/SEI-2010-TN-016.

[8]

Miles Frantz, Ya Xiao, Tanmoy Sarkar Pias, and Danfeng Daphne Yao. 2022. POSTER: Precise Detection of Unprecedented Python Cryptographic Misuses Using On-Demand Analysis. In The Network and Distributed System Security (NDSS) Symposium.

[9]

Xiang Gao, Shraddha Barke, Arjun Radhakrishna, Gustavo Soares, Sumit Gulwani, Alan Leung, Nachiappan Nagappan, and Ashish Tiwari. 2020. Feedback-driven semi-supervised synthesis of program transformations. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–30.

Digital Library

[10]

Xiang Gao, Arjun Radhakrishna, Gustavo Soares, Ridwan Shariffdeen, Sumit Gulwani, and Abhik Roychoudhury. 2021. APIfix: output-oriented program synthesis for combating breaking changes in libraries. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–27.

Digital Library

[11]

Zuxing Gu, Jiecheng Wu, Chi Li, Min Zhou, Yu Jiang, Ming Gu, and Jiaguang Sun. 2019. Vetting api usages in c programs with imchecker. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 91–94.

Digital Library

[12]

Xincheng He, Xiaojin Liu, and Lei Xu. 2023. Python API Misuse Mining and Classification Based on Hybrid Analysis and Attention Mechanism. International Journal of Software Engineering and Knowledge Engineering.

[13]

Shuo Hong, Hailong Sun, Xiang Gao, and Shin Hwei Tan. 2024. Investigating and Detecting Silent Bugs in PyTorch Programs.

[14]

Finn V Jensen. 1996. An introduction to Bayesian networks. 210, UCL press London.

[15]

JodaOrg. n.d. jodatime. https://github.com/JodaOrg/joda-time.git

[16]

Yuan Kang, Baishakhi Ray, and Suman Jana. 2016. Apex: Automated inference of error specifications for c apis. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 472–482.

Digital Library

[17]

Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, and Daniel Demmler. 2017. Cognicrypt: Supporting developers in using cryptography. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 931–936.

[18]

Stefan Krüger, Johannes Späth, Karim Ali, Eric Bodden, and Mira Mezini. 2019. Crysl: An extensible approach to validating the correct usage of cryptographic apis. IEEE Transactions on Software Engineering, 47, 11 (2019), 2382–2400.

[19]

Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving api caveats accessibility by mining api caveats knowledge graph. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 183–193.

[20]

Wenqing Li, Shijie Jia, Limin Liu, Fangyu Zheng, Yuan Ma, and Jingqiang Lin. 2022. CryptoGo: Automatic Detection of Go Cryptographic API Misuses. In Proceedings of the 38th Annual Computer Security Applications Conference. 318–331.

Digital Library

[21]

Xia Li, Jiajun Jiang, Samuel Benton, Yingfei Xiong, and Lingming Zhang. 2021. A Large-scale Study on API Misuses in the Wild. In 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). 241–252.

[22]

Ziyang Li, Aravind Machiry, Binghong Chen, Mayur Naik, Ke Wang, and Le Song. 2021. Arbitrar: User-guided api misuse detection. In 2021 IEEE Symposium on Security and Privacy (SP). 1400–1415.

[23]

Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes, 30, 5 (2005), 306–315.

Digital Library

[24]

Qingmi Liang, Zhirui Kuai, Yangqi Zhang, Zhiyang Zhang, Li Kuang, and Lingyan Zhang. 2022. MisuseHint: A Service for API Misuse Detection Based on Building Knowledge Graph from Documentation and Codebase. In 2022 IEEE International Conference on Web Services (ICWS). 246–255.

[25]

Mingwei Liu, Xin Peng, Andrian Marcus, Christoph Treude, Xuefang Bai, Gang Lyu, Jiazhan Xie, and Xiaoxin Zhang. 2021. Learning-based extraction of first-order logic representations of API directives. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 491–502.

Digital Library

[26]

Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 120–130.

Digital Library

[27]

Kangjie Lu, Aditya Pakki, and Qiushi Wu. 2019. Detecting $Missing-Check$ Bugs via Semantic-and $Context-Aware$ Criticalness and Constraints Inferences. In 28th USENIX Security Symposium (USENIX Security 19). 1769–1786.

[28]

Tao Lv, Ruishi Li, Yi Yang, Kai Chen, Xiaojing Liao, XiaoFeng Wang, Peiwei Hu, and Luyi Xing. 2020. Rtfm! automatic assumption discovery and verification derivation from library document for api misuse detection. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security. 1837–1852.

Digital Library

[29]

Martin Monperrus, Michael Eichberg, Elif Tekes, and Mira Mezini. 2012. What should developers be aware of? An empirical study on the directives of API documentation. Empirical Software Engineering, 17 (2012), 703–737.

Digital Library

[30]

Martin Monperrus and Mira Mezini. 2013. Detecting missing method calls as violations of the majority rule. ACM Transactions on Software Engineering and Methodology (TOSEM), 22, 1 (2013), 1–25.

Digital Library

[31]

Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection. In Fundamental Approaches to Software Engineering. Springer Berlin Heidelberg.

[32]

Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H Pham, Jafar M Al-Kofahi, and Tien N Nguyen. 2009. Graph-based mining of multiple object usage patterns. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on the Foundations of Software Engineering. 383–392.

Digital Library

[33]

Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2016. Learning API usages from bytecode: A statistical approach. In Proceedings of the 38th International Conference on Software Engineering. 416–427.

Digital Library

[34]

Sebastian Nielebock, Robert Heumüller, Kevin Michael Schott, and Frank Ortmeier. 2021. Guided pattern mining for API misuse detection by change-based code analysis. Automated Software Engineering, 28, 2 (2021), 15.

Digital Library

[35]

Shuyin OuYang, Fan Ge, Li Kuang, and Yuyu Yin. 2021. API Misuse Detection based on Stacked LSTM. In Collaborative Computing: Networking, Applications and Worksharing: 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16–18, 2020, Proceedings, Part I 16. 421–438.

[36]

Rahul Pandita, Kunal Taneja, Laurie Williams, and Teresa Tung. 2016. ICON: Inferring temporal constraints from natural language api descriptions. In 2016 IEEE international conference on software maintenance and evolution (ICSME). 378–388.

[37]

Lawrence Rabiner and Biinghwang Juang. 1986. An introduction to hidden Markov models. ieee assp magazine, 3, 1 (1986), 4–16.

[38]

Ti Ramraj and Ri Prabhakar. 2015. Frequent subgraph mining algorithms–a survey. Procedia Computer Science, 47 (2015), 197–204.

[39]

T. Ramraj and R. Prabhakar. 2015. Frequent Subgraph Mining Algorithms – A Survey. Procedia Computer Science, 47 (2015), 197–204. issn:1877-0509 https://doi.org/10.1016/j.procs.2015.03.198 Graph Algorithms, High Performance Implementations and Its Applications ( ICGHIA 2014 )

[40]

Xiaoxue Ren, Jiamou Sun, Zhenchang Xing, Xin Xia, and Jianling Sun. 2020. Demystify official api usage directives with crowdsourced api misuse scenarios, erroneous code examples and patches. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 925–936.

Digital Library

[41]

Xiaoxue Ren, Xinyuan Ye, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Jianling Sun. 2021. API-Misuse Detection Driven by Fine-Grained API-Constraint Knowledge Graph. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ’20). ACM, 461–472. isbn:9781450367684 https://doi.org/10.1145/3324884.3416551

Digital Library

[42]

Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In Proceedings of the 36th international conference on software engineering. 643–652.

Digital Library

[43]

Amann Sven, Hoan Anh Nguyen, Sarah Nadi, Tien N. Nguyen, and Mira Mezini. 2019. Investigating Next Steps in Static API-Misuse Detection. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 265–275. https://doi.org/10.1109/MSR.2019.00053

Digital Library

[44]

Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In 2009 IEEE/ACM International Conference on Automated Software Engineering. 283–294.

Digital Library

[45]

Suresh Thummalapenta and Tao Xie. 2009. Mining exception-handling rules as sequence association rules. In 2009 IEEE 31st International Conference on Software Engineering. 496–506.

Digital Library

[46]

Gias Uddin, Foutse Khomh, and Chanchal K Roy. 2020. Mining API usage scenarios from stack overflow. Information and Software Technology, 122 (2020), 106277.

[47]

Aparna Vadlamani, Rishitha Kalicheti, and Sridhar Chimalakonda. 2021. APIScanner-Towards Automated Detection of Deprecated APIs in Python Libraries. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 5–8.

[48]

Xiaoke Wang and Lei Zhao. 2023. APICAD: Augmenting API Misuse Detection Through Specifications From Code And Documents. In 45th International Conference on Software Engineering (ICSE).

[49]

Andrzej Wasylkowski and Andreas Zeller. 2011. Mining temporal specifications from object usage. Automated Software Engineering, 18 (2011), 263–292.

Digital Library

[50]

Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 35–44.

Digital Library

[51]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903.

[52]

Ming Wen, Yepang Liu, Rongxin Wu, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. 2019. Exposing library API misuses via mutation analysis. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 866–877.

Digital Library

[53]

Moritz Wittenhagen, Christian Cherek, and Jan Borchers. 2016. Chronicler: Interactive exploration of source code history. In Proceedings of the 2016 CHI conference on human factors in computing systems. 3522–3532.

Digital Library

[54]

Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. $APISan$: Sanitizing $API$ Usages through Semantic $Cross-Checking$. In 25th USENIX Security Symposium (USENIX Security 16). 363–378.

[55]

Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online q&a forum reliable? a study of api misuse on stack overflow. In Proceedings of the 40th international conference on software engineering. 886–896.

Digital Library

[56]

Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. 2009. MAPO: Mining and recommending API usage patterns. In ECOOP 2009–Object-Oriented Programming: 23rd European Conference, Genoa, Italy, July 6-10, 2009. Proceedings 23. 318–343.

Digital Library

Index Terms

API Misuse Detection via Probabilistic Graphical Model
1. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

API-misuse detection driven by fine-grained API-constraint knowledge graph
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

API misuses cause significant problem in software development. Existing methods detect API misuses against frequent API usage patterns mined from codebase. They make a naive assumption that API usage that deviates from the most-frequent API usage is a ...
Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

In modern software development, developers access reusable functionality provided by third-party libraries through Application Programming Interfaces (APIs). However, using APIs requires developers to conform specific constraints and guidelines, ...
Poirot: Deep Learning for API Misuse Detection
ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

API misuses refer to incorrect usages that violate the usage constraints of API elements, potentially leading to issues such as runtime errors, exceptions, program crashes, and security vulnerabilities. Existing mining-based approaches for API misuse ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 2024

1928 pages

ISBN:9798400706127

DOI:10.1145/3650212

General Chair:
Maria Christakis
TU Wien, Austria
,
Program Chair:
Michael Pradel
University of Stuttgart, Germany

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

ISSTA '24

Sponsor:

SIGSOFT

ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
218
Total Downloads

Downloads (Last 12 months)218
Downloads (Last 6 weeks)34

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents