research-article

Open access

On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical Study

Authors:

Sebastian ProkschAuthors Info & Claims

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

Pages 457 - 468

https://doi.org/10.1145/3643991.3644897

Published: 02 July 2024 Publication History

Abstract

Static call graph (CG) construction often over-approximates call relations, leading to sound, but imprecise results. Recent research has explored machine learning (ML)-based CG pruning as a means to enhance precision by eliminating false edges. However, current methods suffer from a limited evaluation dataset, imbalanced training data, and reduced recall, which affects practical downstream analyses. Prior results were also not compared with advanced static CG construction techniques yet. This study tackles these issues. We introduce the NYXCorpus, a dataset of real-world Java programs with high test coverage and we collect traces from test executions and build a ground truth of dynamic CGs. We leverage these CGs to explore conservative pruning strategies during the training and inference of ML-based CG pruners. We conduct a comparative analysis of static CGs generated using zero control flow analysis (0-CFA) and those produced by a context-sensitive 1-CFA algorithm, evaluating both with and without pruning. We find that CG pruning is a difficult task for real-world Java projects and substantial improvements in the CG precision (+25%) meet reduced recall (-9%). However, our experiments show promising results: even when we favor recall over precision by using an F2 metric in our experiments, we can show that pruned CGs have comparable quality to a context-sensitive 1-CFA analysis while being computationally less demanding. Resulting CGs are much smaller (69%), and substantially faster (3.5x speed-up), with virtually unchanged results in our downstream analysis.

References

[1]

[n. d.]. Java 1-17 Parser and Abstract Syntax Tree for Java with advanced analysis functionalities. https://javaparser.org/ Accessed: 2023-07-31.

[2]

[n. d.]. The official open-source implementation of AutoPruner. https://github.com/soarsmu/AutoPruner/ Accessed: 2023-08-01.

[3]

[n. d.]. PyTorch 2.0. https://pytorch.org/blog/pytorch-2.0-release/. 2023-06-13.

[4]

[n. d.]. PyTorch Lightning. https://lightning.ai/docs/pytorch/latest/

[5]

[n. d.]. T.J. Watson Libraries for Analysis, with frontends for Java, Android, and JavaScript, and may common static program analyses. https://github.com/wala/WALA/releases Accessed: 2023-11-17.

[6]

Karim Ali and Ondrej Lhotak. 2012. Application-only call graph construction. In ECOOP 2012--Object-Oriented Programming: 26th European Conference, Beijing, China, June 11-16, 2012. Proceedings 26. Springer, 688--712.

Digital Library

[7]

Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1--37.

Digital Library

[8]

Eugenio Angriman, Alexander van der Grinten, Michael Hamann, Henning Meyerhenke, and Manuel Penschuck. 2022. Algorithms for Large-Scale Network Analysis and the NetworKit Toolkit. Springer Nature Switzerland, Cham, 3--20.

[9]

Stephen M Blackburn, Robin Garner, Chris Hoffmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, et al. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. 169--190.

Digital Library

[10]

Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243--262.

Digital Library

[11]

Leo Breiman. 1996. Bagging predictors. Machine learning 24 (1996), 123--140.

[12]

David Callahan, Alan Carle, Mary W. Hall, and Ken Kennedy. 1990. Constructing the procedure call multigraph. IEEE Transactions on Software Engineering 16, 4 (1990), 483--487.

Digital Library

[13]

Jeffrey Dean, David Grove, and Craig Chambers. 1995. Optimization of object-oriented programs using static class hierarchy analysis. In ECOOP'95---Object-Oriented Programming, 9th European Conference, Åarhus, Denmark, August 7--11, 1995 9. Springer, 77--101.

[14]

JB Dietrich, Henrik Schole, Li Sui, and Ewan Tempero. 2017. XCorpus-an executable corpus of Java programs. (2017).

[15]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 1536--1547.

[16]

Stephen Fink and Julian Dolby. 2012. WALA-The TJ Watson Libraries for Analysis.

[17]

Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416--419.

Digital Library

[18]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).

[19]

Joseph Hejderup, Arie van Deursen, and Georgios Gousios. 2018. Software ecosystem call graph for dependency management. In Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results. 101--104.

Digital Library

[20]

Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278--282.

Digital Library

[21]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).

[22]

Minseok Jeon and Hakjoo Oh. 2022. Return of CFA: call-site sensitivity can be superior to object sensitivity even for object-oriented programs. Proceedings of the ACM on Programming Languages 6, POPL (2022), 1--29.

Digital Library

[23]

Christian Gram Kalhauge and Jens Palsberg. 2018. Sound deadlock prediction. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1--29.

Digital Library

[24]

Ali Khatami and Andy Zaidman. 2023. State-Of-The-Practice in Quality Assurance in Java-Based Open Source Software Development. arXiv preprint arXiv:2306.09665 (2023).

[25]

Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D Le, and Quyet Thang Huynh. 2022. AutoPruner: transformer-based call graph pruning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 520--532.

Digital Library

[26]

Ondrej Lhotak. 2007. Comparing call graphs. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. 37--42.

Digital Library

[27]

Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Scalability-first pointer analysis with self-tuning context-sensitivity. In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. 129--140.

Digital Library

[28]

Tim Lindholm, Frank Yellin, Gilad Bracha, Alex Buckley, and Daniel Smith. 2021. The Java Virtual Machine Specification: Java SE 17 Edition. (2021).

[29]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[30]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).

[31]

Wenjie Ma, Shengyuan Yang, Tian Tan, Xiaoxing Ma, Chang Xu, and Yue Li. 2023. Context Sensitivity without Contexts: A Cut-Shortcut Approach to Fast and Precise Pointer Analysis. Proceedings of the ACM on Programming Languages 7, PLDI (2023), 539--564.

Digital Library

[32]

Ravi Mangal, Xin Zhang, Aditya V Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 462--473.

Digital Library

[33]

Dimitrios Michail, Joris Kinable, Barak Naveh, and John V. Sichi. 2020. JGraphT-A Java Library for Graph Data Structures and Algorithms. ACM Trans. Math. Softw. 46, 2, Article 16 (May 2020), 29 pages.

Digital Library

[34]

Amir M Mir, Mehdi Keshani, and Sebastian Proksch. 2023. On the Effect of Transitivity and Granularity on Vulnerability Propagation in the Maven Ecosystem. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 201--211.

[35]

Gail C Murphy, David Notkin, William G Griswold, and Erica S Lan. 1998. An empirical study of static call graph extractors. ACM Transactions on Software Engineering and Methodology (TOSEM) 7, 2 (1998), 158--191.

Digital Library

[36]

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).

[37]

Jens Palsberg and Cristina V Lopes. 2018. Njr: A normalized java resource. In Companion Proceedings for the ISSTA/ECOOP 2018 Workshops. 100--106.

Digital Library

[38]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037.

[39]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.

[40]

Serena E. Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and C'edric Dangremont. 2019. A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software. In Proceedings of the 16th International Conference on Mining Software Repositories.

Digital Library

[41]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485--5551.

Digital Library

[42]

Michael Reif. 2021. Novel Approaches to Systematically Evaluating and Constructing Call Graphs for Java Software. (2021).

[43]

Michael Reif, Florian Kübler, Michael Eichberg, Dominik Helm, and Mira Mezini. 2019. Judge: Identifying, understanding, and evaluating sources of unsoundness in call graphs. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 251--261.

Digital Library

[44]

Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical society 74, 2 (1953), 358--366.

[45]

Barbara G Ryder. 1979. Constructing the call graph of a program. IEEE Transactions on Software Engineering 3 (1979), 216--226.

Digital Library

[46]

Jason Sawin and Atanas Rountev. 2011. Assumption hierarchy for a CHA call graph construction algorithm. In 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation. IEEE, 35--44.

Digital Library

[47]

Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, Indira Vats, Hadi Moazen, and Federica Sarro. 2021. A survey on machine learning techniques for source code analysis. arXiv preprint arXiv:2110.09610 (2021).

[48]

Olin Grigsby Shivers. 1991. Control-flow analysis of higher-order languages or taming lambda. Carnegie Mellon University.

[49]

Yannis Smaragdakis. 2021. Doop-framework for Java pointer and taint analysis (using p/taint). Retrieved Jan 10 (2021), 2021.

[50]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.

Digital Library

[51]

Li Sui, Jens Dietrich, Amjed Tahir, and George Fourtounis. 2020. On the recall of static call graph construction in practice. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1049--1060.

Digital Library

[52]

Tian Tan, Yue Li, and Jingling Xue. 2016. Making k-object-sensitive pointer analysis more precise with still k-limiting. In Static Analysis: 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Proceedings. Springer, 489--510.

[53]

Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A curated collection of Java code for empirical studies. In 2010 Asia pacific software engineering conference. IEEE, 336--345.

[54]

Akshay Utture, Shuyang Liu, Christian Gram Kalhauge, and Jens Palsberg. 2022. Striking a balance: pruning false-positives from static call graphs. In Proceedings of the 44th International Conference on Software Engineering. 2043--2055.

Digital Library

[55]

Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 2010. Soot: A Java bytecode optimization framework. In CASCON First Decade High Impact Papers. 214--224.

Digital Library

[56]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[57]

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. 2023. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922 (2023).

[58]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8696--8708.

[59]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).

[60]

Tao Xie and David Notkin. 2002. An empirical study of java dynamic call graph extractors. University of Washington CSE Technical Report (2002), 02--12.

[61]

Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, and Jian-Guang Lou. 2022. When Neural Model Meets NL2Code: A Survey. arXiv preprint arXiv:2212.09420 (2022).

[62]

Weilei Zhang and Barbara G Ryder. 2007. Automatic construction of accurate application call graph with library call abstraction for Java. Journal of Software Maintenance and Evolution: Research and Practice 19, 4 (2007), 231--252.

Digital Library

Recommendations

An empirical study of static call graph extractors

Informally, a call graph represents calls between entities in a given program. The call graphs that compilers compute to determine the applicability of an optimization must typically be conservative: a call may be omitted only if it can never occur in ...
Serialization-aware call graph construction
SOAP 2021: Proceedings of the 10th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis

Although call graphs are crucial for inter-procedural analyses, it is challenging to statically compute them for programs with dynamic features. Prior work focused on supporting certain kinds of dynamic features, but serialization-related features are ...
Call graphs for languages with parametric polymorphism
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

The performance of contemporary object oriented languages depends on optimizations such as devirtualization, inlining, and specialization, and these in turn depend on precise call graph analysis. Existing call graph analyses do not take advantage of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

April 2024

788 pages

ISBN:9798400705878

DOI:10.1145/3643991

Chair:
Diomidis Spinellis,
Program Chair:
Alberto Bacchelli,
Program Co-chair:
Eleni Constantinou

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MSR '24

Sponsor:

SIGSOFT

MSR '24: 21st International Conference on Mining Software Repositories

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
87
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)20

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten