Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3650212.3652112acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

API Misuse Detection via Probabilistic Graphical Model

Published: 11 September 2024 Publication History

Abstract

API misuses can cause a range of issues in software development, including program crashes, bugs, and vulnerabilities. Different approaches have been developed to automatically detect API misuses by checking the program against usage rules extracted from extensive codebase or API documents. However, these mined rules may not be precise or complete, leading to high false positive/negative rates. In this paper, we propose a novel solution to this problem by representing the mined API usage rules as a probabilistic graphical model, where each rule's probability value represents its trustworthiness of being correct. Our approach automatically constructs probabilistic usage rules by mining codebase and documents, and aggregating knowledge from different sources. Here, the usage rules obtained from the codebase initialize the probabilistic model, while the knowledge from the documents serves as a supplement for adjusting and complementing the probabilities accordingly. We evaluate our approach on the MuBench benchmark. Experimental results show that our approach achieves 42.0% precision and 54.5% recall, significantly outperforming state-of-the-art approaches.

References

[1]
n.d. httpclient. http://svn.apache.org/repos/asf/jakarta/commons/proper/httpclient/trunk/
[2]
Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You get where you’re looking for: The impact of information sources on code security. In 2016 IEEE Symposium on Security and Privacy (SP). 289–305.
[3]
Sven Amann, Sarah Nadi, Hoan A Nguyen, Tien N Nguyen, and Mira Mezini. 2016. MUBench: A benchmark for API-misuse detectors. In Proceedings of the 13th international conference on mining software repositories. 464–467.
[4]
asf. n.d. jackrabbit. http://svn.apache.org/repos/asf/jackrabbit/trunk/
[5]
Pavel Avgustinov, Oege De Moor, Michael Peyton Jones, and Max Schäfer. 2016. QL: Object-oriented queries on relational data. In 30th European Conference on Object-Oriented Programming (ECOOP 2016).
[6]
Cédric Beust. n.d. TestNG. https://github.com/cbeust/testng
[7]
Robert J Ellison, John B Goodenough, Charles B Weinstock, and Carol Woody. 2010. Evaluating and mitigating software supply chain security risks. Software Engineering Institute, Tech. Rep. CMU/SEI-2010-TN-016.
[8]
Miles Frantz, Ya Xiao, Tanmoy Sarkar Pias, and Danfeng Daphne Yao. 2022. POSTER: Precise Detection of Unprecedented Python Cryptographic Misuses Using On-Demand Analysis. In The Network and Distributed System Security (NDSS) Symposium.
[9]
Xiang Gao, Shraddha Barke, Arjun Radhakrishna, Gustavo Soares, Sumit Gulwani, Alan Leung, Nachiappan Nagappan, and Ashish Tiwari. 2020. Feedback-driven semi-supervised synthesis of program transformations. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–30.
[10]
Xiang Gao, Arjun Radhakrishna, Gustavo Soares, Ridwan Shariffdeen, Sumit Gulwani, and Abhik Roychoudhury. 2021. APIfix: output-oriented program synthesis for combating breaking changes in libraries. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–27.
[11]
Zuxing Gu, Jiecheng Wu, Chi Li, Min Zhou, Yu Jiang, Ming Gu, and Jiaguang Sun. 2019. Vetting api usages in c programs with imchecker. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 91–94.
[12]
Xincheng He, Xiaojin Liu, and Lei Xu. 2023. Python API Misuse Mining and Classification Based on Hybrid Analysis and Attention Mechanism. International Journal of Software Engineering and Knowledge Engineering.
[13]
Shuo Hong, Hailong Sun, Xiang Gao, and Shin Hwei Tan. 2024. Investigating and Detecting Silent Bugs in PyTorch Programs.
[14]
Finn V Jensen. 1996. An introduction to Bayesian networks. 210, UCL press London.
[15]
JodaOrg. n.d. jodatime. https://github.com/JodaOrg/joda-time.git
[16]
Yuan Kang, Baishakhi Ray, and Suman Jana. 2016. Apex: Automated inference of error specifications for c apis. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 472–482.
[17]
Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, and Daniel Demmler. 2017. Cognicrypt: Supporting developers in using cryptography. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 931–936.
[18]
Stefan Krüger, Johannes Späth, Karim Ali, Eric Bodden, and Mira Mezini. 2019. Crysl: An extensible approach to validating the correct usage of cryptographic apis. IEEE Transactions on Software Engineering, 47, 11 (2019), 2382–2400.
[19]
Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving api caveats accessibility by mining api caveats knowledge graph. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 183–193.
[20]
Wenqing Li, Shijie Jia, Limin Liu, Fangyu Zheng, Yuan Ma, and Jingqiang Lin. 2022. CryptoGo: Automatic Detection of Go Cryptographic API Misuses. In Proceedings of the 38th Annual Computer Security Applications Conference. 318–331.
[21]
Xia Li, Jiajun Jiang, Samuel Benton, Yingfei Xiong, and Lingming Zhang. 2021. A Large-scale Study on API Misuses in the Wild. In 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). 241–252.
[22]
Ziyang Li, Aravind Machiry, Binghong Chen, Mayur Naik, Ke Wang, and Le Song. 2021. Arbitrar: User-guided api misuse detection. In 2021 IEEE Symposium on Security and Privacy (SP). 1400–1415.
[23]
Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes, 30, 5 (2005), 306–315.
[24]
Qingmi Liang, Zhirui Kuai, Yangqi Zhang, Zhiyang Zhang, Li Kuang, and Lingyan Zhang. 2022. MisuseHint: A Service for API Misuse Detection Based on Building Knowledge Graph from Documentation and Codebase. In 2022 IEEE International Conference on Web Services (ICWS). 246–255.
[25]
Mingwei Liu, Xin Peng, Andrian Marcus, Christoph Treude, Xuefang Bai, Gang Lyu, Jiazhan Xie, and Xiaoxin Zhang. 2021. Learning-based extraction of first-order logic representations of API directives. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 491–502.
[26]
Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 120–130.
[27]
Kangjie Lu, Aditya Pakki, and Qiushi Wu. 2019. Detecting $Missing-Check$ Bugs via Semantic-and $Context-Aware$ Criticalness and Constraints Inferences. In 28th USENIX Security Symposium (USENIX Security 19). 1769–1786.
[28]
Tao Lv, Ruishi Li, Yi Yang, Kai Chen, Xiaojing Liao, XiaoFeng Wang, Peiwei Hu, and Luyi Xing. 2020. Rtfm! automatic assumption discovery and verification derivation from library document for api misuse detection. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security. 1837–1852.
[29]
Martin Monperrus, Michael Eichberg, Elif Tekes, and Mira Mezini. 2012. What should developers be aware of? An empirical study on the directives of API documentation. Empirical Software Engineering, 17 (2012), 703–737.
[30]
Martin Monperrus and Mira Mezini. 2013. Detecting missing method calls as violations of the majority rule. ACM Transactions on Software Engineering and Methodology (TOSEM), 22, 1 (2013), 1–25.
[31]
Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection. In Fundamental Approaches to Software Engineering. Springer Berlin Heidelberg.
[32]
Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H Pham, Jafar M Al-Kofahi, and Tien N Nguyen. 2009. Graph-based mining of multiple object usage patterns. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on the Foundations of Software Engineering. 383–392.
[33]
Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2016. Learning API usages from bytecode: A statistical approach. In Proceedings of the 38th International Conference on Software Engineering. 416–427.
[34]
Sebastian Nielebock, Robert Heumüller, Kevin Michael Schott, and Frank Ortmeier. 2021. Guided pattern mining for API misuse detection by change-based code analysis. Automated Software Engineering, 28, 2 (2021), 15.
[35]
Shuyin OuYang, Fan Ge, Li Kuang, and Yuyu Yin. 2021. API Misuse Detection based on Stacked LSTM. In Collaborative Computing: Networking, Applications and Worksharing: 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16–18, 2020, Proceedings, Part I 16. 421–438.
[36]
Rahul Pandita, Kunal Taneja, Laurie Williams, and Teresa Tung. 2016. ICON: Inferring temporal constraints from natural language api descriptions. In 2016 IEEE international conference on software maintenance and evolution (ICSME). 378–388.
[37]
Lawrence Rabiner and Biinghwang Juang. 1986. An introduction to hidden Markov models. ieee assp magazine, 3, 1 (1986), 4–16.
[38]
Ti Ramraj and Ri Prabhakar. 2015. Frequent subgraph mining algorithms–a survey. Procedia Computer Science, 47 (2015), 197–204.
[39]
T. Ramraj and R. Prabhakar. 2015. Frequent Subgraph Mining Algorithms – A Survey. Procedia Computer Science, 47 (2015), 197–204. issn:1877-0509 https://doi.org/10.1016/j.procs.2015.03.198 Graph Algorithms, High Performance Implementations and Its Applications ( ICGHIA 2014 )
[40]
Xiaoxue Ren, Jiamou Sun, Zhenchang Xing, Xin Xia, and Jianling Sun. 2020. Demystify official api usage directives with crowdsourced api misuse scenarios, erroneous code examples and patches. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 925–936.
[41]
Xiaoxue Ren, Xinyuan Ye, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Jianling Sun. 2021. API-Misuse Detection Driven by Fine-Grained API-Constraint Knowledge Graph. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ’20). ACM, 461–472. isbn:9781450367684 https://doi.org/10.1145/3324884.3416551
[42]
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In Proceedings of the 36th international conference on software engineering. 643–652.
[43]
Amann Sven, Hoan Anh Nguyen, Sarah Nadi, Tien N. Nguyen, and Mira Mezini. 2019. Investigating Next Steps in Static API-Misuse Detection. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 265–275. https://doi.org/10.1109/MSR.2019.00053
[44]
Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In 2009 IEEE/ACM International Conference on Automated Software Engineering. 283–294.
[45]
Suresh Thummalapenta and Tao Xie. 2009. Mining exception-handling rules as sequence association rules. In 2009 IEEE 31st International Conference on Software Engineering. 496–506.
[46]
Gias Uddin, Foutse Khomh, and Chanchal K Roy. 2020. Mining API usage scenarios from stack overflow. Information and Software Technology, 122 (2020), 106277.
[47]
Aparna Vadlamani, Rishitha Kalicheti, and Sridhar Chimalakonda. 2021. APIScanner-Towards Automated Detection of Deprecated APIs in Python Libraries. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 5–8.
[48]
Xiaoke Wang and Lei Zhao. 2023. APICAD: Augmenting API Misuse Detection Through Specifications From Code And Documents. In 45th International Conference on Software Engineering (ICSE).
[49]
Andrzej Wasylkowski and Andreas Zeller. 2011. Mining temporal specifications from object usage. Automated Software Engineering, 18 (2011), 263–292.
[50]
Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 35–44.
[51]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903.
[52]
Ming Wen, Yepang Liu, Rongxin Wu, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. 2019. Exposing library API misuses via mutation analysis. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 866–877.
[53]
Moritz Wittenhagen, Christian Cherek, and Jan Borchers. 2016. Chronicler: Interactive exploration of source code history. In Proceedings of the 2016 CHI conference on human factors in computing systems. 3522–3532.
[54]
Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. $APISan$: Sanitizing $API$ Usages through Semantic $Cross-Checking$. In 25th USENIX Security Symposium (USENIX Security 16). 363–378.
[55]
Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online q&a forum reliable? a study of api misuse on stack overflow. In Proceedings of the 40th international conference on software engineering. 886–896.
[56]
Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. 2009. MAPO: Mining and recommending API usage patterns. In ECOOP 2009–Object-Oriented Programming: 23rd European Conference, Genoa, Italy, July 6-10, 2009. Proceedings 23. 318–343.

Index Terms

  1. API Misuse Detection via Probabilistic Graphical Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis
    September 2024
    1928 pages
    ISBN:9798400706127
    DOI:10.1145/3650212
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 September 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. API misuse detection
    2. Document Mining
    3. Mining Software Repository
    4. Probabilistic Graphical Model

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    ISSTA '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 218
      Total Downloads
    • Downloads (Last 12 months)218
    • Downloads (Last 6 weeks)34
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media