Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Published: 19 June 2024 Publication History

Abstract

Static analysis tools for capturing bugs and vulnerabilities in software programs are widely employed in practice, as they have the unique advantages of high coverage and independence from the execution environment. However, existing tools for analyzing large codebases often produce a great deal of false warnings over genuine bug reports. As a result, developers are required to manually inspect and confirm each warning, a challenging, time-consuming, and automation-essential task.
This article advocates a fast, general, and easily extensible approach called Llm4sa that automatically inspects a sheer volume of static warnings by harnessing (some of) the powers of Large Language Models (LLMs). Our key insight is that LLMs have advanced program understanding capabilities, enabling them to effectively act as human experts in conducting manual inspections on bug warnings with their relevant code snippets. In this spirit, we propose a static analysis to effectively extract the relevant code snippets via program dependence traversal guided by the bug warning reports themselves. Then, by formulating customized questions that are enriched with domain knowledge and representative cases to query LLMs, Llm4sa can remove a great deal of false warnings and facilitate bug discovery significantly. Our experiments demonstrate that Llm4sa is practical in automatically inspecting thousands of static warnings from Juliet benchmark programs and 11 real-world C/C++ projects, showcasing a high precision (81.13%) and a recall rate (94.64%) for a total of 9,547 bug warnings. Our research introduces new opportunities and methodologies for using the LLMs to reduce human labor costs, improve the precision of static analyzers, and ensure software trustworthiness

References

[1]
Toufique Ahmed and Premkumar Devanbu. 2023. Better patching using LLM prompting, via self-consistency. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23). IEEE, 1742–1746.
[2]
John R Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 177–189.
[3]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). ACM, New York, NY, 259–269. DOI:
[4]
Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles-Henri Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66–75. DOI:
[5]
Sam Blackshear, Nikos Gorogiannis, Peter W. O’Hearn, and Ilya Sergey. 2018. RacerD: Compositional static race detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 144 (Oct.2018), 28 pages. DOI:
[6]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’17), Bhavani Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu (Eds.). ACM, 2329–2344. DOI:
[7]
James Brotherston, Paul Brunet, Nikos Gorogiannis, and Max I. Kanovich. 2021. A compositional deadlock detector for Android Java. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 955–966. DOI:
[8]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’20), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
[9]
Frank Busse, Pritam Gharat, Cristian Cadar, and Alastair F. Donaldson. 2022. Combining static analysis error traces with dynamic symbolic execution (experience paper). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’22), Sukyoung Ryu and Yannis Smaragdakis (Eds.). ACM, 568–579. DOI:
[10]
Frank Busse, Pritam Gharat, Cristian Cadar, and Alastair F. Donaldson. 2022. Combining static analysis error traces with dynamic symbolic execution (experience paper). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 568–579. DOI:
[11]
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08), Richard Draves and Robbert van Renesse (Eds.). USENIX Association, 209–224. Retrieved from http://www.usenix.org/events/osdi08/tech/full_papers/cadar/cadar.pdf
[12]
Yuandao Cai, Yibo Jin, and Charles Zhang. 2024. Unleashing the power of type-based call graph construction by using regional pointer information. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security’24), Davide Balzarotti and Wenyuan Xu (Eds.). USENIX Association. Retrieved from https://www.usenix.org/system/files/sec23winter-prepub-350-cai.pdf
[13]
Yuandao Cai, Peisen Yao, Chengfeng Ye, and Charles Zhang. 2023. Place your locks well: Understanding and detecting lock misuse bugs. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security’23), Joseph A. Calandrino and Carmela Troncoso (Eds.). USENIX Association. Retrieved from https://www.usenix.org/conference/usenixsecurity23/presentation/cai-yuandao
[14]
Yuandao Cai, Peisen Yao, and Charles Zhang. 2021. Canary: Practical static detection of inter-thread value-flow bugs. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’21), Stephen N. Freund and Eran Yahav (Eds.). ACM, 1126–1140. DOI:
[15]
Yuandao Cai, Chengfeng Ye, Qingkai Shi, and Charles Zhang. 2022. Peahen: Fast and precise static deadlock detection via context reduction. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’22). Association for Computing Machinery, New York, NY, 784–796. DOI:
[16]
Yuandao Cai and Charles Zhang. 2023. A cocktail approach to practical call graph construction. Proc. ACM Program. Lang. 7, OOPSLA2 (2023). DOI:
[17]
Cristiano Calcagno and Dino Distefano. 2011. Infer: An automatic program verifier for memory safety of C programs. In Proceedings of the NASA Formal Methods Symposium. Springer, 459–465. DOI:
[18]
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 480–491. DOI:
[19]
Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, and Ting Liu. 2023. A survey of chain of thought reasoning: Advances, frontiers and future. arXiv preprint arXiv:2309.15402 (2023).
[20]
Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, Chenyuan Yang, and Lingming Zhang. 2023. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’23), René Just and Gordon Fraser (Eds.). ACM, 423–435. DOI:
[21]
Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W. O’Hearn. 2019. Scaling static analyses at Facebook. Commun. ACM 62, 8 (2019), 62–70. DOI:
[22]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom.2023. Llama 2: Open foundation and fine-tuned chat models. CoRR abs/2307.09288 (2023).
[23]
Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, and Charles Zhang. 2019. SMOKE: Scalable path-sensitive memory leak detection for millions of lines of code. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19), Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE/ACM, 72–82. DOI:
[24]
Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 1469–1481. DOI:
[25]
Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349. DOI:
[26]
David Gens, Simon Schmitt, Lucas Davi, and Ahmad-Reza Sadeghi. 2018. K-Miner: Uncovering memory corruption in Linux. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS’18). The Internet Society. Retrieved from https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_05A-1_Gens_paper.pdf
[27]
Ben Hardekopf and Calvin Lin. 2009. Semi-sparse flow-sensitive pointer analysis. In Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’09), Zhong Shao and Benjamin C. Pierce (Eds.). ACM, 226–238. DOI:
[28]
Dongjie He, Haofeng Li, Lei Wang, Haining Meng, Hengjie Zheng, Jie Liu, Shuangwei Hu, Lian Li, and Jingling Xue. 2019. Performance-boosting sparsification of the IFDS algorithm with applications to taint analysis. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). IEEE, 267–279. DOI:
[29]
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John C. Grundy, and Haoyu Wang. 2023. Large language models for software engineering: A systematic literature review. CoRR abs/2308.10620 (2023).
[30]
Heqing Huang, Yiyuan Guo, Qingkai Shi, Peisen Yao, Rongxin Wu, and Charles Zhang. 2022. BEACON: Directed grey-box fuzzing with provable path pruning. In Proceedings of the 43rd IEEE Symposium on Security and Privacy (SP’22). IEEE, 36–50. DOI:
[31]
Nasif Imtiaz, Brendan Murphy, and Laurie Williams. 2019. How do developers act on static analysis alerts? An empirical study of coverity usage. In Proceedings of the IEEE 30th International Symposium on Software Reliability Engineering (ISSRE’19). IEEE, 323–333. DOI:
[32]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs? In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE, 672–681. DOI:
[33]
Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, and Wei Le. 2021. Validating static warnings via testing code fragments. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’21), Cristian Cadar and Xiangyu Zhang (Eds.). ACM, 540–552. DOI:
[34]
Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, and Wei Le. 2021. Validating static warnings via testing code fragments. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’21). Association for Computing Machinery, New York, NY, 540–552. DOI:
[35]
Hong Jin Kang, Khai Loong Aw, and David Lo. 2022. Detecting false alarms from automatic static analysis tools: How far are we? In Proceedings of the 44th International Conference on Software Engineering. 698–709. DOI:
[36]
Sungmin Kang, Juyeon Yoon, Nargiz Askarbekkyzy, and Shin Yoo. 2023. Evaluating diverse large language models for automatic and general bug reproduction. arXiv preprint arXiv:2311.04532 (2023).
[37]
Sungmin Kang, Juyeon Yoon, and Shin Yoo. 2023. Large language models are few-shot testers: Exploring LLM-based general bug reproduction. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 2312–2323. DOI:
[38]
Anant Kharkar, Roshanak Zilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin B. Clement, and Neel Sundaresan. 2022. Learning to reduce false positives in analytic bug detectors. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). ACM, 1307–1316. DOI:
[39]
J. Kocoń, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydło, J. Baran, J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz, A. Kocoń, B. Koptyra, W. Mieleszczenko-Kowszewicz, P. Miłkowski, M. Oleksy, M. Piasecki, Ł. Radliński, K. Wojtasik, S. Woźniak, P. Kazienko, ChatGPT: Jack of all trades, master of none. Information Fusion. 99 (2023) 101861.
[40]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’22). Retrieved from http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html
[41]
Ted Kremenek. 2008. Finding software bugs with the clang static analyzer. Apple Inc (2008), 2008–08.
[42]
Daniel Kroening and Michael Tautschnig. 2014. CBMC - C bounded model checker—(competition contribution). In Proceedings of the 20th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’14), Held as Part of the European Joint Conferences on Theory and Practice of Software (ETAPS’14) (Lecture Notes in Computer Science), Erika Ábrahám and Klaus Havelund (Eds.), Vol. 8413. Springer, 389–391. DOI:
[43]
Quang Loc Le, Azalea Raad, Jules Villard, Josh Berdine, Derek Dreyer, and Peter W. O’Hearn. 2022. Finding real bugs in big programs with incorrectness logic. Proc. ACM Program. Lang. 6, OOPSLA1 (2022), 1–27. DOI:
[44]
Seongmin Lee, Shin Hong, Jungbae Yi, Taeksu Kim, Chul-Joo Kim, and Shin Yoo. 2019. Classifying false positive static checker alarms in continuous integration using convolutional neural networks. In Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST’19). IEEE, 391–401. DOI:
[45]
Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023. CodaMosa: Escaping coverage plateaus in test generation with pre-trained large language models. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 919–931. DOI:
[46]
Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2023. Poster: Assisting static analysis with large language models: A ChatGPT experiment. In Proceedings of the 44th IEEE Symposium on Security and Privacy (SP’23). IEEE. Retrieved from https://www.ieee-security.org/TC/SP2023/downloads/SP23-posters/sp23-posters-paper39-final_version_2_page_abstract.pdf
[47]
Tuo Li, Jia-Ju Bai, Yulei Sui, and Shi-Min Hu. 2022. Path-sensitive and alias-aware typestate analysis for detecting OS bugs. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’22), Babak Falsafi, Michael Ferdman, Shan Lu, and Thomas F. Wenisch (Eds.). ACM, 859–872. DOI:
[48]
Changhua Luo, Wei Meng, and Penghui Li. 2023. SelectFuzz: Efficient directed fuzzing with selective path exploration. In Proceedings of the 44th IEEE Symposium on Security and Privacy (SP’23). IEEE, 2693–2707. DOI:
[49]
Wei Ma, Shangqing Liu, Wenhan Wang, Qiang Hu, Ye Liu, Cen Zhang, Liming Nie, and Yang Liu. 2023. The scope of ChatGPT in software engineering: A thorough investigation. CoRR abs/2305.12138 (2023)
[50]
Aravind Machiry, Chad Spensky, Jake Corina, Nick Stephens, Christopher Kruegel, and Giovanni Vigna. 2017. DR. CHECKER: A soundy analysis for linux kernel drivers. In Proceedings of the 26th USENIX Security Symposium (USENIX Security’17), Engin Kirda and Thomas Ristenpart (Eds.). USENIX Association, 1007–1024. Retrieved from https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/machiry
[51]
Roman Manevich, Manu Sridharan, Stephen Adams, Manuvir Das, and Zhe Yang. 2004. PSE: Explaining program failures via postmortem static analysis. In Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Richard N. Taylor and Matthew B. Dwyer (Eds.). ACM, 63–72. DOI:
[52]
Daniel Marjamäki. Cppcheck: A tool for static C/C++ code analysis. Retrieved ACCESSED: 2024 from https://cppcheck.sourceforge.io
[53]
Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’13), Bertrand Meyer, Luciano Baresi, and Mira Mezini (Eds.). ACM, 554–564. DOI:
[54]
Aniruddhan Murali, Noble Saji Mathews, Mahmoud Alfadel, Meiyappan Nagappan, and Meng Xu. 2023. FuzzSlice: Pruning false positives in static analysis warnings through function-level fuzzing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE’24). IEEE Computer Society, 767–779.
[55]
Tukaram Muske and Alexander Serebrenik. 2022. Survey of approaches for postprocessing of static analysis alarms. ACM Comput. Surv. 55, 3, Article 48 (Feb.2022), 39 pages. DOI:
[56]
Damien Octeau, Patrick D. McDaniel, Somesh Jha, Alexandre Bartel, Eric Bodden, Jacques Klein, and Yves Le Traon. 2013. Effective inter-component communication mapping in Android: An essential step towards holistic security analysis. In Proceedings of the 22nd USENIX Security Symposium, Samuel T. King (Ed.). USENIX Association, 543–558. Retrieved from https://www.usenix.org/conference/usenixsecurity13/technical-sessions/presentation/octeau
[57]
Mads Chr. Olesen, René Rydhof Hansen, Julia L. Lawall, and Nicolas Palix. 2014. Coccinelle: Tool support for automated CERT C secure coding standard certification. Sci. Comput. Program. 91 (2014), 141–160. DOI:
[58]
OpenAI. ChatGPT: Optimizing language models for dialogue. Retrieved ACCESSED: 2024 from https://chat.openai.com
[59]
Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining zero-shot vulnerability repair with large language models. In Proceedings of the 44th IEEE Symposium on Security and Privacy (SP’23). IEEE, 2339–2356. DOI:
[60]
Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, and Pengcheng Yin. 2023. Can large language models reason about program invariants? In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Honolulu, Hawaii, USA.
[61]
Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber-Phys. Systems.
[62]
Thomas W. Reps. 1997. Program analysis via graph reachability. In Proceedings of the International Symposium on Logic Programming, Jan Maluszynski (Ed.). MIT Press, 5–19.
[63]
Ruben. A code-understanding, code-browsing or code-search tool. This is a tool to index, then query or search C, C++, Java, Python, Ruby, Go and JavaScript source code. Retrieved ACCESSED: 2024. from https://github.com/ruben2020/codequery
[64]
Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan. 2018. Lessons from building static analysis tools at google. Commun. ACM 61, 4 (2018), 58–66. DOI:
[65]
Shmuel Sagiv, Thomas W. Reps, and Susan Horwitz. 1996. Precise interprocedural dataflow analysis with applications to constant propagation. Theor. Comput. Sci. 167, 1&2 (1996), 131–170. DOI:
[66]
Timo Schick and Hinrich Schütze. 2022. True few-shot learning with prompts—A real-world perspective. Trans. Assoc. Computat. Ling. 10 (2022), 716–731.
[67]
Qingkai Shi, Rongxin Wu, Gang Fan, and Charles Zhang. 2020. Conquering the extensional scalability problem for value-flow analysis frameworks. In Proceedings of the42nd International Conference on Software Engineering (ICSE’20), Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 812–823. DOI:
[68]
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 693–706. DOI:
[69]
Johannes Späth, Karim Ali, and Eric Bodden. 2017. IDEal: Efficient and precise alias-aware dataflow analysis. In Proceedings of the International Conference on Object-Oriented Programming, Languages and Applications (OOPSLA/SPLASH’17). ACM Press. Retrieved from https://bodden.de/pubs/sab17ideal.pdf
[70]
Johannes Späth, Lisa Nguyen Quang Do, Karim Ali, and Eric Bodden. 2016. Boomerang: Demand-driven flow- and context-sensitive pointer analysis for Java. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’16). Retrieved from https://www.bodden.de/pubs/sna+16boomerang.pdf
[71]
Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC’16), Ayal Zaks and Manuel V. Hermenegildo (Eds.). ACM, 265–266. DOI:
[72]
Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’12), Mats Per Erik Heimdahl and Zhendong Su (Eds.). ACM, 254–264. DOI:
[73]
Maolin Sun, Yibiao Yang, Yang Wang, Ming Wen, Haoxiang Jia, and Yuming Zhou. 2023. SMT solver validation empowered by large pre-trained language models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23). IEEE, 1288–1300.
[74]
Weisong Sun, Chunrong Fang, Yudu You, Yun Miao, Yi Liu, Yuekang Li, Gelei Deng, Shenghan Huang, Yuchen Chen, Quanjun Zhang, Hanwei Qian, Yang Liu, and Zhenyu Chen. 2023. Automatic code summarization via ChatGPT: How far are we? CoRR abs/2305.12865 (2023).
[75]
Nigar M. Shafiq Surameery and Mohammed Y Shakor. 2023. Use Chat GPT to solve programming bugs. Int. J. Inf. Technol. Comput. Eng. 3, 01 (2023), 17–22.
[76]
Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023. Large language models: The next frontier for variable discovery within metamorphic testing? In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’23). IEEE, 678–682.
[77]
Carmine Vassallo, Sebastiano Panichella, Fabio Palomba, Sebastian Proksch, Harald C. Gall, and Andy Zaidman. 2020. How developers engage with static analysis tools in different contexts. Empir. Softw. Eng. 25 (2020), 1419–1457. DOI:
[78]
Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, and Xiangke Liao. 2022. Bridging pre-trained models and downstream tasks for source code understanding. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). ACM, 287–298. DOI:
[79]
Haijun Wang, Xiaofei Xie, Yi Li, Cheng Wen, Yuekang Li, Yang Liu, Shengchao Qin, Hongxu Chen, and Yulei Sui. 2020. Typestate-guided fuzzer for discovering use-after-free vulnerabilities. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 999–1010.
[80]
Junjie Wang, Song Wang, and Qing Wang. 2018. Is there a “golden” feature set for static warning identification?: An experimental evaluation. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’18), Markku Oivo, Daniel Méndez Fernández, and Audris Mockus (Eds.). ACM, 17:1–17:10. DOI:
[81]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022).
[82]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’22). Retrieved from http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html
[83]
Cheng Wen, Mengda He, Bohao Wu, Zhiwu Xu, and Shengchao Qin. 2022. Controlled concurrency testing via periodical scheduling. In Proceedings of the 44th International Conference on Software Engineering. 474–486.
[84]
Cheng Wen, Haijun Wang, Yuekang Li, Shengchao Qin, Yang Liu, Zhiwu Xu, Hongxu Chen, Xiaofei Xie, Geguang Pu, and Ting Liu. 2020. MemLock: Memory usage guided fuzzing. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 765–777.
[85]
Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23). IEEE, 1482–1494. DOI:
[86]
Weiwei Xiong, Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou, and Zhiqiang Ma. 2010. Ad hoc synchronization considered harmful. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI’10), Remzi H. Arpaci-Dusseau and Brad Chen (Eds.). USENIX Association, 163–176. Retrieved from http://www.usenix.org/events/osdi10/tech/full_papers/Xiong.pdf
[87]
Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua Tenenbaum, and Chuang Gan. 2022. Prompting decision transformer for few-shot policy generalization. In Proceedings of the International Conference on Machine Learning. PMLR, 24631–24645.
[88]
Zhiwu Xu, Cheng Wen, and Shengchao Qin. 2018. State-taint analysis for detecting resource bugs. Sci. Comput. Program. 162 (2018), 93–109.
[89]
Zhiwu Xu, Bohao Wu, Cheng Wen, Bin Zhang, Shengchao Qin, and Mengda He. 2024. RPG: Rust library fuzzing with pool-based fuzz target generation and generic support. In Proceedings of the 46th International Conference on Software Engineering.
[90]
Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue. 2018. Spatio-temporal context reduction: A pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In Proceedings of the 40th International Conference on Software Engineering (ICSE’18), Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 327–337. DOI:
[91]
Mengjiao Sherry Yang, Dale Schuurmans, Pieter Abbeel, and Ofir Nachum. 2022. Chain of thought imitation with procedure cloning. Adv. Neural Inf. Process. Syst. 35 (2022), 36366–36381.
[92]
Chengfeng Ye, Yuandao Cai, and Charles Zhang. 2024. When threads meet interrupts: Effective static detection of interrupt-based deadlocks. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security’24), Davide Balzarotti and Wenyuan Xu (Eds.). USENIX Association.
[93]
Xi Ye and Greg Durrett. 2022. The unreliability of explanations in few-shot prompting for textual reasoning. Adv. Neural Inf. Process. Syst. 35 (2022), 30378–30392.
[94]
Joobeom Yun, Rustamov Fayozbek, Juhwan Kim, and Youngjoo Shin. 2023. Fuzzing of embedded systems: A survey. ACM Comput. Surv. 55, 7 (2023), 137:1–137:33. DOI:
[95]
Yizhuo Zhai, Yu Hao, Hang Zhang, Daimeng Wang, Chengyu Song, Zhiyun Qian, Mohsen Lesani, Srikanth V. Krishnamurthy, and Paul L. Yu. 2020. UBITect: A precise and scalable method to detect use-before-initialization bugs in linux kernel. In ESEC/FSE’20: Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’20, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 221–232. DOI:
[96]
Xin Zhang, Radu Grigore, Xujie Si, and Mayur Naik. 2017. Effective interactive resolution of static analysis alarms. Proc. ACM Program. Lang. 1, OOPSLA (2017), 57:1–57:30. DOI:
[97]
Xiaowen Zhang, Ying Zhou, and Shin Hwei Tan. 2023. Efficient pattern-based static analysis approach via regular-expression rules. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’23), Tao Zhang, Xin Xia, and Nicole Novielli (Eds.). IEEE, 132–143. DOI:
[98]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A survey of large language models. CoRR abs/2303.18223 (2023).

Cited By

View all
  • (2024)Secure Cryptographic Technology Framework for Data Element Circulation Transactions2024 IEEE 11th International Conference on Cyber Security and Cloud Computing (CSCloud)10.1109/CSCloud62866.2024.00040(187-193)Online publication date: 28-Jun-2024
  • (2024)Enchanting Program Specification Synthesis by Large Language Models Using Static Analysis and Program VerificationComputer Aided Verification10.1007/978-3-031-65630-9_16(302-328)Online publication date: 24-Jul-2024
  • (2024)CFStra: Enhancing Configurable Program Analysis Through LLM-Driven Strategy Selection Based on Code FeaturesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_22(374-391)Online publication date: 14-Jul-2024

Index Terms

  1. Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 7
      August 2024
      505 pages
      EISSN:1556-472X
      DOI:10.1145/3613689
      • Editor:
      • Jian Pei
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 June 2024
      Online AM: 26 March 2024
      Accepted: 03 March 2024
      Revised: 30 January 2024
      Received: 15 September 2023
      Published in TKDD Volume 18, Issue 7

      Check for updates

      Author Tags

      1. Large language model
      2. static analysis
      3. AI for program analysis
      4. static bug warning
      5. false alarms

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • China Postdoctoral Science Foundation
      • Fundamental Research Funds for the Central Universities

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,145
      • Downloads (Last 6 weeks)162
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Secure Cryptographic Technology Framework for Data Element Circulation Transactions2024 IEEE 11th International Conference on Cyber Security and Cloud Computing (CSCloud)10.1109/CSCloud62866.2024.00040(187-193)Online publication date: 28-Jun-2024
      • (2024)Enchanting Program Specification Synthesis by Large Language Models Using Static Analysis and Program VerificationComputer Aided Verification10.1007/978-3-031-65630-9_16(302-328)Online publication date: 24-Jul-2024
      • (2024)CFStra: Enhancing Configurable Program Analysis Through LLM-Driven Strategy Selection Based on Code FeaturesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_22(374-391)Online publication date: 14-Jul-2024

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media