research-article

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Authors:

Tian CongAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 7

Article No.: 168, Pages 1 - 34

https://doi.org/10.1145/3653718

Published: 19 June 2024 Publication History

Abstract

Static analysis tools for capturing bugs and vulnerabilities in software programs are widely employed in practice, as they have the unique advantages of high coverage and independence from the execution environment. However, existing tools for analyzing large codebases often produce a great deal of false warnings over genuine bug reports. As a result, developers are required to manually inspect and confirm each warning, a challenging, time-consuming, and automation-essential task.

This article advocates a fast, general, and easily extensible approach called Llm4sa that automatically inspects a sheer volume of static warnings by harnessing (some of) the powers of Large Language Models (LLMs). Our key insight is that LLMs have advanced program understanding capabilities, enabling them to effectively act as human experts in conducting manual inspections on bug warnings with their relevant code snippets. In this spirit, we propose a static analysis to effectively extract the relevant code snippets via program dependence traversal guided by the bug warning reports themselves. Then, by formulating customized questions that are enriched with domain knowledge and representative cases to query LLMs, Llm4sa can remove a great deal of false warnings and facilitate bug discovery significantly. Our experiments demonstrate that Llm4sa is practical in automatically inspecting thousands of static warnings from Juliet benchmark programs and 11 real-world C/C++ projects, showcasing a high precision (81.13%) and a recall rate (94.64%) for a total of 9,547 bug warnings. Our research introduces new opportunities and methodologies for using the LLMs to reduce human labor costs, improve the precision of static analyzers, and ensure software trustworthiness

References

[1]

Toufique Ahmed and Premkumar Devanbu. 2023. Better patching using LLM prompting, via self-consistency. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23). IEEE, 1742–1746.

Digital Library

[2]

John R Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 177–189.

Digital Library

[3]

Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). ACM, New York, NY, 259–269. DOI:

Digital Library

[4]

Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles-Henri Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66–75. DOI:

Digital Library

[5]

Sam Blackshear, Nikos Gorogiannis, Peter W. O’Hearn, and Ilya Sergey. 2018. RacerD: Compositional static race detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 144 (Oct.2018), 28 pages. DOI:

Digital Library

[6]

Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’17), Bhavani Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu (Eds.). ACM, 2329–2344. DOI:

Digital Library

[7]

James Brotherston, Paul Brunet, Nikos Gorogiannis, and Max I. Kanovich. 2021. A compositional deadlock detector for Android Java. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 955–966. DOI:

Digital Library

[8]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’20), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

[9]

Frank Busse, Pritam Gharat, Cristian Cadar, and Alastair F. Donaldson. 2022. Combining static analysis error traces with dynamic symbolic execution (experience paper). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’22), Sukyoung Ryu and Yannis Smaragdakis (Eds.). ACM, 568–579. DOI:

Digital Library

[10]

Frank Busse, Pritam Gharat, Cristian Cadar, and Alastair F. Donaldson. 2022. Combining static analysis error traces with dynamic symbolic execution (experience paper). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 568–579. DOI:

Digital Library

[11]

Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08), Richard Draves and Robbert van Renesse (Eds.). USENIX Association, 209–224. Retrieved from http://www.usenix.org/events/osdi08/tech/full_papers/cadar/cadar.pdf

[12]

Yuandao Cai, Yibo Jin, and Charles Zhang. 2024. Unleashing the power of type-based call graph construction by using regional pointer information. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security’24), Davide Balzarotti and Wenyuan Xu (Eds.). USENIX Association. Retrieved from https://www.usenix.org/system/files/sec23winter-prepub-350-cai.pdf

[13]

Yuandao Cai, Peisen Yao, Chengfeng Ye, and Charles Zhang. 2023. Place your locks well: Understanding and detecting lock misuse bugs. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security’23), Joseph A. Calandrino and Carmela Troncoso (Eds.). USENIX Association. Retrieved from https://www.usenix.org/conference/usenixsecurity23/presentation/cai-yuandao

[14]

Yuandao Cai, Peisen Yao, and Charles Zhang. 2021. Canary: Practical static detection of inter-thread value-flow bugs. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’21), Stephen N. Freund and Eran Yahav (Eds.). ACM, 1126–1140. DOI:

Digital Library

[15]

Yuandao Cai, Chengfeng Ye, Qingkai Shi, and Charles Zhang. 2022. Peahen: Fast and precise static deadlock detection via context reduction. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’22). Association for Computing Machinery, New York, NY, 784–796. DOI:

Digital Library

[16]

Yuandao Cai and Charles Zhang. 2023. A cocktail approach to practical call graph construction. Proc. ACM Program. Lang. 7, OOPSLA2 (2023). DOI:

Digital Library

[17]

Cristiano Calcagno and Dino Distefano. 2011. Infer: An automatic program verifier for memory safety of C programs. In Proceedings of the NASA Formal Methods Symposium. Springer, 459–465. DOI:

[18]

Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 480–491. DOI:

Digital Library

[19]

Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, and Ting Liu. 2023. A survey of chain of thought reasoning: Advances, frontiers and future. arXiv preprint arXiv:2309.15402 (2023).

[20]

Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, Chenyuan Yang, and Lingming Zhang. 2023. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’23), René Just and Gordon Fraser (Eds.). ACM, 423–435. DOI:

Digital Library

[21]

Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W. O’Hearn. 2019. Scaling static analyses at Facebook. Commun. ACM 62, 8 (2019), 62–70. DOI:

Digital Library

[22]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom.2023. Llama 2: Open foundation and fine-tuned chat models. CoRR abs/2307.09288 (2023).

[23]

Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, and Charles Zhang. 2019. SMOKE: Scalable path-sensitive memory leak detection for millions of lines of code. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19), Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE/ACM, 72–82. DOI:

Digital Library

[24]

Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 1469–1481. DOI:

Digital Library

[25]

Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349. DOI:

Digital Library

[26]

David Gens, Simon Schmitt, Lucas Davi, and Ahmad-Reza Sadeghi. 2018. K-Miner: Uncovering memory corruption in Linux. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS’18). The Internet Society. Retrieved from https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_05A-1_Gens_paper.pdf

[27]

Ben Hardekopf and Calvin Lin. 2009. Semi-sparse flow-sensitive pointer analysis. In Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’09), Zhong Shao and Benjamin C. Pierce (Eds.). ACM, 226–238. DOI:

Digital Library

[28]

Dongjie He, Haofeng Li, Lei Wang, Haining Meng, Hengjie Zheng, Jie Liu, Shuangwei Hu, Lian Li, and Jingling Xue. 2019. Performance-boosting sparsification of the IFDS algorithm with applications to taint analysis. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). IEEE, 267–279. DOI:

Digital Library

[29]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John C. Grundy, and Haoyu Wang. 2023. Large language models for software engineering: A systematic literature review. CoRR abs/2308.10620 (2023).

[30]

Heqing Huang, Yiyuan Guo, Qingkai Shi, Peisen Yao, Rongxin Wu, and Charles Zhang. 2022. BEACON: Directed grey-box fuzzing with provable path pruning. In Proceedings of the 43rd IEEE Symposium on Security and Privacy (SP’22). IEEE, 36–50. DOI:

[31]

Nasif Imtiaz, Brendan Murphy, and Laurie Williams. 2019. How do developers act on static analysis alerts? An empirical study of coverity usage. In Proceedings of the IEEE 30th International Symposium on Software Reliability Engineering (ISSRE’19). IEEE, 323–333. DOI:

[32]

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs? In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE, 672–681. DOI:

[33]

Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, and Wei Le. 2021. Validating static warnings via testing code fragments. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’21), Cristian Cadar and Xiangyu Zhang (Eds.). ACM, 540–552. DOI:

Digital Library

[34]

Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, and Wei Le. 2021. Validating static warnings via testing code fragments. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’21). Association for Computing Machinery, New York, NY, 540–552. DOI:

Digital Library

[35]

Hong Jin Kang, Khai Loong Aw, and David Lo. 2022. Detecting false alarms from automatic static analysis tools: How far are we? In Proceedings of the 44th International Conference on Software Engineering. 698–709. DOI:

Digital Library

[36]

Sungmin Kang, Juyeon Yoon, Nargiz Askarbekkyzy, and Shin Yoo. 2023. Evaluating diverse large language models for automatic and general bug reproduction. arXiv preprint arXiv:2311.04532 (2023).

[37]

Sungmin Kang, Juyeon Yoon, and Shin Yoo. 2023. Large language models are few-shot testers: Exploring LLM-based general bug reproduction. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 2312–2323. DOI:

Digital Library

[38]

Anant Kharkar, Roshanak Zilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin B. Clement, and Neel Sundaresan. 2022. Learning to reduce false positives in analytic bug detectors. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). ACM, 1307–1316. DOI:

Digital Library

[39]

J. Kocoń, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydło, J. Baran, J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz, A. Kocoń, B. Koptyra, W. Mieleszczenko-Kowszewicz, P. Miłkowski, M. Oleksy, M. Piasecki, Ł. Radliński, K. Wojtasik, S. Woźniak, P. Kazienko, ChatGPT: Jack of all trades, master of none. Information Fusion. 99 (2023) 101861.

[40]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’22). Retrieved from http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html

[41]

Ted Kremenek. 2008. Finding software bugs with the clang static analyzer. Apple Inc (2008), 2008–08.

[42]

Daniel Kroening and Michael Tautschnig. 2014. CBMC - C bounded model checker—(competition contribution). In Proceedings of the 20th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’14), Held as Part of the European Joint Conferences on Theory and Practice of Software (ETAPS’14) (Lecture Notes in Computer Science), Erika Ábrahám and Klaus Havelund (Eds.), Vol. 8413. Springer, 389–391. DOI:

[43]

Quang Loc Le, Azalea Raad, Jules Villard, Josh Berdine, Derek Dreyer, and Peter W. O’Hearn. 2022. Finding real bugs in big programs with incorrectness logic. Proc. ACM Program. Lang. 6, OOPSLA1 (2022), 1–27. DOI:

Digital Library

[44]

Seongmin Lee, Shin Hong, Jungbae Yi, Taeksu Kim, Chul-Joo Kim, and Shin Yoo. 2019. Classifying false positive static checker alarms in continuous integration using convolutional neural networks. In Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST’19). IEEE, 391–401. DOI:

[45]

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023. CodaMosa: Escaping coverage plateaus in test generation with pre-trained large language models. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 919–931. DOI:

Digital Library

[46]

Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2023. Poster: Assisting static analysis with large language models: A ChatGPT experiment. In Proceedings of the 44th IEEE Symposium on Security and Privacy (SP’23). IEEE. Retrieved from https://www.ieee-security.org/TC/SP2023/downloads/SP23-posters/sp23-posters-paper39-final_version_2_page_abstract.pdf

[47]

Tuo Li, Jia-Ju Bai, Yulei Sui, and Shi-Min Hu. 2022. Path-sensitive and alias-aware typestate analysis for detecting OS bugs. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’22), Babak Falsafi, Michael Ferdman, Shan Lu, and Thomas F. Wenisch (Eds.). ACM, 859–872. DOI:

Digital Library

[48]

Changhua Luo, Wei Meng, and Penghui Li. 2023. SelectFuzz: Efficient directed fuzzing with selective path exploration. In Proceedings of the 44th IEEE Symposium on Security and Privacy (SP’23). IEEE, 2693–2707. DOI:

[49]

Wei Ma, Shangqing Liu, Wenhan Wang, Qiang Hu, Ye Liu, Cen Zhang, Liming Nie, and Yang Liu. 2023. The scope of ChatGPT in software engineering: A thorough investigation. CoRR abs/2305.12138 (2023)

[50]

Aravind Machiry, Chad Spensky, Jake Corina, Nick Stephens, Christopher Kruegel, and Giovanni Vigna. 2017. DR. CHECKER: A soundy analysis for linux kernel drivers. In Proceedings of the 26th USENIX Security Symposium (USENIX Security’17), Engin Kirda and Thomas Ristenpart (Eds.). USENIX Association, 1007–1024. Retrieved from https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/machiry

[51]

Roman Manevich, Manu Sridharan, Stephen Adams, Manuvir Das, and Zhe Yang. 2004. PSE: Explaining program failures via postmortem static analysis. In Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Richard N. Taylor and Matthew B. Dwyer (Eds.). ACM, 63–72. DOI:

Digital Library

[52]

Daniel Marjamäki. Cppcheck: A tool for static C/C++ code analysis. Retrieved ACCESSED: 2024 from https://cppcheck.sourceforge.io

[53]

Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’13), Bertrand Meyer, Luciano Baresi, and Mira Mezini (Eds.). ACM, 554–564. DOI:

Digital Library

[54]

Aniruddhan Murali, Noble Saji Mathews, Mahmoud Alfadel, Meiyappan Nagappan, and Meng Xu. 2023. FuzzSlice: Pruning false positives in static analysis warnings through function-level fuzzing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE’24). IEEE Computer Society, 767–779.

[55]

Tukaram Muske and Alexander Serebrenik. 2022. Survey of approaches for postprocessing of static analysis alarms. ACM Comput. Surv. 55, 3, Article 48 (Feb.2022), 39 pages. DOI:

Digital Library

[56]

Damien Octeau, Patrick D. McDaniel, Somesh Jha, Alexandre Bartel, Eric Bodden, Jacques Klein, and Yves Le Traon. 2013. Effective inter-component communication mapping in Android: An essential step towards holistic security analysis. In Proceedings of the 22nd USENIX Security Symposium, Samuel T. King (Ed.). USENIX Association, 543–558. Retrieved from https://www.usenix.org/conference/usenixsecurity13/technical-sessions/presentation/octeau

Digital Library

[57]

Mads Chr. Olesen, René Rydhof Hansen, Julia L. Lawall, and Nicolas Palix. 2014. Coccinelle: Tool support for automated CERT C secure coding standard certification. Sci. Comput. Program. 91 (2014), 141–160. DOI:

[58]

OpenAI. ChatGPT: Optimizing language models for dialogue. Retrieved ACCESSED: 2024 from https://chat.openai.com

[59]

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining zero-shot vulnerability repair with large language models. In Proceedings of the 44th IEEE Symposium on Security and Privacy (SP’23). IEEE, 2339–2356. DOI:

[60]

Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, and Pengcheng Yin. 2023. Can large language models reason about program invariants? In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Honolulu, Hawaii, USA.

[61]

Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber-Phys. Systems.

[62]

Thomas W. Reps. 1997. Program analysis via graph reachability. In Proceedings of the International Symposium on Logic Programming, Jan Maluszynski (Ed.). MIT Press, 5–19.

[63]

Ruben. A code-understanding, code-browsing or code-search tool. This is a tool to index, then query or search C, C++, Java, Python, Ruby, Go and JavaScript source code. Retrieved ACCESSED: 2024. from https://github.com/ruben2020/codequery

[64]

Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan. 2018. Lessons from building static analysis tools at google. Commun. ACM 61, 4 (2018), 58–66. DOI:

Digital Library

[65]

Shmuel Sagiv, Thomas W. Reps, and Susan Horwitz. 1996. Precise interprocedural dataflow analysis with applications to constant propagation. Theor. Comput. Sci. 167, 1&2 (1996), 131–170. DOI:

Digital Library

[66]

Timo Schick and Hinrich Schütze. 2022. True few-shot learning with prompts—A real-world perspective. Trans. Assoc. Computat. Ling. 10 (2022), 716–731.

[67]

Qingkai Shi, Rongxin Wu, Gang Fan, and Charles Zhang. 2020. Conquering the extensional scalability problem for value-flow analysis frameworks. In Proceedings of the42nd International Conference on Software Engineering (ICSE’20), Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 812–823. DOI:

Digital Library

[68]

Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 693–706. DOI:

Digital Library

[69]

Johannes Späth, Karim Ali, and Eric Bodden. 2017. IDEal: Efficient and precise alias-aware dataflow analysis. In Proceedings of the International Conference on Object-Oriented Programming, Languages and Applications (OOPSLA/SPLASH’17). ACM Press. Retrieved from https://bodden.de/pubs/sab17ideal.pdf

[70]

Johannes Späth, Lisa Nguyen Quang Do, Karim Ali, and Eric Bodden. 2016. Boomerang: Demand-driven flow- and context-sensitive pointer analysis for Java. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’16). Retrieved from https://www.bodden.de/pubs/sna+16boomerang.pdf

[71]

Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC’16), Ayal Zaks and Manuel V. Hermenegildo (Eds.). ACM, 265–266. DOI:

Digital Library

[72]

Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’12), Mats Per Erik Heimdahl and Zhendong Su (Eds.). ACM, 254–264. DOI:

Digital Library

[73]

Maolin Sun, Yibiao Yang, Yang Wang, Ming Wen, Haoxiang Jia, and Yuming Zhou. 2023. SMT solver validation empowered by large pre-trained language models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23). IEEE, 1288–1300.

Digital Library

[74]

Weisong Sun, Chunrong Fang, Yudu You, Yun Miao, Yi Liu, Yuekang Li, Gelei Deng, Shenghan Huang, Yuchen Chen, Quanjun Zhang, Hanwei Qian, Yang Liu, and Zhenyu Chen. 2023. Automatic code summarization via ChatGPT: How far are we? CoRR abs/2305.12865 (2023).

[75]

Nigar M. Shafiq Surameery and Mohammed Y Shakor. 2023. Use Chat GPT to solve programming bugs. Int. J. Inf. Technol. Comput. Eng. 3, 01 (2023), 17–22.

[76]

Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023. Large language models: The next frontier for variable discovery within metamorphic testing? In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’23). IEEE, 678–682.

[77]

Carmine Vassallo, Sebastiano Panichella, Fabio Palomba, Sebastian Proksch, Harald C. Gall, and Andy Zaidman. 2020. How developers engage with static analysis tools in different contexts. Empir. Softw. Eng. 25 (2020), 1419–1457. DOI:

[78]

Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, and Xiangke Liao. 2022. Bridging pre-trained models and downstream tasks for source code understanding. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). ACM, 287–298. DOI:

Digital Library

[79]

Haijun Wang, Xiaofei Xie, Yi Li, Cheng Wen, Yuekang Li, Yang Liu, Shengchao Qin, Hongxu Chen, and Yulei Sui. 2020. Typestate-guided fuzzer for discovering use-after-free vulnerabilities. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 999–1010.

Digital Library

[80]

Junjie Wang, Song Wang, and Qing Wang. 2018. Is there a “golden” feature set for static warning identification?: An experimental evaluation. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’18), Markku Oivo, Daniel Méndez Fernández, and Audris Mockus (Eds.). ACM, 17:1–17:10. DOI:

Digital Library

[81]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022).

[82]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’22). Retrieved from http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html

[83]

Cheng Wen, Mengda He, Bohao Wu, Zhiwu Xu, and Shengchao Qin. 2022. Controlled concurrency testing via periodical scheduling. In Proceedings of the 44th International Conference on Software Engineering. 474–486.

Digital Library

[84]

Cheng Wen, Haijun Wang, Yuekang Li, Shengchao Qin, Yang Liu, Zhiwu Xu, Hongxu Chen, Xiaofei Xie, Geguang Pu, and Ting Liu. 2020. MemLock: Memory usage guided fuzzing. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 765–777.

Digital Library

[85]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 1482–1494. DOI:

Digital Library

[86]

Weiwei Xiong, Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou, and Zhiqiang Ma. 2010. Ad hoc synchronization considered harmful. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10), Remzi H. Arpaci-Dusseau and Brad Chen (Eds.). USENIX Association, 163–176. Retrieved from http://www.usenix.org/events/osdi10/tech/full_papers/Xiong.pdf

[87]

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua Tenenbaum, and Chuang Gan. 2022. Prompting decision transformer for few-shot policy generalization. In Proceedings of the International Conference on Machine Learning. PMLR, 24631–24645.

[88]

Zhiwu Xu, Cheng Wen, and Shengchao Qin. 2018. State-taint analysis for detecting resource bugs. Sci. Comput. Program. 162 (2018), 93–109.

[89]

Zhiwu Xu, Bohao Wu, Cheng Wen, Bin Zhang, Shengchao Qin, and Mengda He. 2024. RPG: Rust library fuzzing with pool-based fuzz target generation and generic support. In Proceedings of the 46th International Conference on Software Engineering.

Digital Library

[90]

Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue. 2018. Spatio-temporal context reduction: A pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In Proceedings of the 40th International Conference on Software Engineering (ICSE’18), Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 327–337. DOI:

Digital Library

[91]

Mengjiao Sherry Yang, Dale Schuurmans, Pieter Abbeel, and Ofir Nachum. 2022. Chain of thought imitation with procedure cloning. Adv. Neural Inf. Process. Syst. 35 (2022), 36366–36381.

[92]

Chengfeng Ye, Yuandao Cai, and Charles Zhang. 2024. When threads meet interrupts: Effective static detection of interrupt-based deadlocks. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security’24), Davide Balzarotti and Wenyuan Xu (Eds.). USENIX Association.

[93]

Xi Ye and Greg Durrett. 2022. The unreliability of explanations in few-shot prompting for textual reasoning. Adv. Neural Inf. Process. Syst. 35 (2022), 30378–30392.

[94]

Joobeom Yun, Rustamov Fayozbek, Juhwan Kim, and Youngjoo Shin. 2023. Fuzzing of embedded systems: A survey. ACM Comput. Surv. 55, 7 (2023), 137:1–137:33. DOI:

Digital Library

[95]

Yizhuo Zhai, Yu Hao, Hang Zhang, Daimeng Wang, Chengyu Song, Zhiyun Qian, Mohsen Lesani, Srikanth V. Krishnamurthy, and Paul L. Yu. 2020. UBITect: A precise and scalable method to detect use-before-initialization bugs in linux kernel. In ESEC/FSE’20: Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’20, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 221–232. DOI:

Digital Library

[96]

Xin Zhang, Radu Grigore, Xujie Si, and Mayur Naik. 2017. Effective interactive resolution of static analysis alarms. Proc. ACM Program. Lang. 1, OOPSLA (2017), 57:1–57:30. DOI:

Digital Library

[97]

Xiaowen Zhang, Ying Zhou, and Shin Hwei Tan. 2023. Efficient pattern-based static analysis approach via regular-expression rules. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’23), Tao Zhang, Xin Xia, and Nicole Novielli (Eds.). IEEE, 132–143. DOI:

[98]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A survey of large language models. CoRR abs/2303.18223 (2023).

Cited By

Cho HZhang LJiang X(2024)Secure Cryptographic Technology Framework for Data Element Circulation Transactions2024 IEEE 11th International Conference on Cyber Security and Cloud Computing (CSCloud)10.1109/CSCloud62866.2024.00040(187-193)Online publication date: 28-Jun-2024
https://doi.org/10.1109/CSCloud62866.2024.00040
Wen CCao JSu JXu ZQin SHe MLi HCheung STian C(2024)Enchanting Program Specification Synthesis by Large Language Models Using Static Analysis and Program VerificationComputer Aided Verification10.1007/978-3-031-65630-9_16(302-328)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-65630-9_16
Su JDeng LWen CQin STian C(2024)CFStra: Enhancing Configurable Program Analysis Through LLM-Driven Strategy Selection Based on Code FeaturesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_22(374-391)Online publication date: 14-Jul-2024
https://doi.org/10.1007/978-3-031-64626-3_22

Index Terms

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?
1. Security and privacy
  1. Software and application security
2. Software and its engineering

Recommendations

Prompting Is All You Need: Automated Android Bug Replay with Large Language Models
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Bug reports are vital for software maintenance that allow users to inform developers of the problems encountered while using the software. As such, researchers have committed considerable resources toward automating bug replay to expedite the process of ...
Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach

While static analysis is instrumental in uncovering software bugs, its precision in analyzing large and intricate codebases remains challenging. The emerging prowess of Large Language Models (LLMs) offers a promising avenue to address these complexities. ...
WarningsGuru: integrating statistical bug models with static analysis to provide timely and specific bug warnings
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

The detection of bugs in software systems has been divided into two research areas: static code analysis and statistical modeling of historical data. Static analysis indicates precise problems on line numbers but has the disadvantage of suggesting many ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 7

August 2024

505 pages

EISSN:1556-472X

DOI:10.1145/3613689

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2024

Online AM: 26 March 2024

Accepted: 03 March 2024

Revised: 30 January 2024

Received: 15 September 2023

Published in TKDD Volume 18, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
China Postdoctoral Science Foundation
Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
1,145
Total Downloads

Downloads (Last 12 months)1,145
Downloads (Last 6 weeks)162

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cho HZhang LJiang X(2024)Secure Cryptographic Technology Framework for Data Element Circulation Transactions2024 IEEE 11th International Conference on Cyber Security and Cloud Computing (CSCloud)10.1109/CSCloud62866.2024.00040(187-193)Online publication date: 28-Jun-2024
https://doi.org/10.1109/CSCloud62866.2024.00040
Wen CCao JSu JXu ZQin SHe MLi HCheung STian C(2024)Enchanting Program Specification Synthesis by Large Language Models Using Static Analysis and Program VerificationComputer Aided Verification10.1007/978-3-031-65630-9_16(302-328)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-65630-9_16
Su JDeng LWen CQin STian C(2024)CFStra: Enhancing Configurable Program Analysis Through LLM-Driven Strategy Selection Based on Code FeaturesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_22(374-391)Online publication date: 14-Jul-2024
https://doi.org/10.1007/978-3-031-64626-3_22

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents