Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3643991.3644897acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical Study

Published: 02 July 2024 Publication History

Abstract

Static call graph (CG) construction often over-approximates call relations, leading to sound, but imprecise results. Recent research has explored machine learning (ML)-based CG pruning as a means to enhance precision by eliminating false edges. However, current methods suffer from a limited evaluation dataset, imbalanced training data, and reduced recall, which affects practical downstream analyses. Prior results were also not compared with advanced static CG construction techniques yet. This study tackles these issues. We introduce the NYXCorpus, a dataset of real-world Java programs with high test coverage and we collect traces from test executions and build a ground truth of dynamic CGs. We leverage these CGs to explore conservative pruning strategies during the training and inference of ML-based CG pruners. We conduct a comparative analysis of static CGs generated using zero control flow analysis (0-CFA) and those produced by a context-sensitive 1-CFA algorithm, evaluating both with and without pruning. We find that CG pruning is a difficult task for real-world Java projects and substantial improvements in the CG precision (+25%) meet reduced recall (-9%). However, our experiments show promising results: even when we favor recall over precision by using an F2 metric in our experiments, we can show that pruned CGs have comparable quality to a context-sensitive 1-CFA analysis while being computationally less demanding. Resulting CGs are much smaller (69%), and substantially faster (3.5x speed-up), with virtually unchanged results in our downstream analysis.

References

[1]
[n. d.]. Java 1-17 Parser and Abstract Syntax Tree for Java with advanced analysis functionalities. https://javaparser.org/ Accessed: 2023-07-31.
[2]
[n. d.]. The official open-source implementation of AutoPruner. https://github.com/soarsmu/AutoPruner/ Accessed: 2023-08-01.
[3]
[n. d.]. PyTorch 2.0. https://pytorch.org/blog/pytorch-2.0-release/. 2023-06-13.
[4]
[n. d.]. PyTorch Lightning. https://lightning.ai/docs/pytorch/latest/
[5]
[n. d.]. T.J. Watson Libraries for Analysis, with frontends for Java, Android, and JavaScript, and may common static program analyses. https://github.com/wala/WALA/releases Accessed: 2023-11-17.
[6]
Karim Ali and Ondrej Lhotak. 2012. Application-only call graph construction. In ECOOP 2012--Object-Oriented Programming: 26th European Conference, Beijing, China, June 11-16, 2012. Proceedings 26. Springer, 688--712.
[7]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1--37.
[8]
Eugenio Angriman, Alexander van der Grinten, Michael Hamann, Henning Meyerhenke, and Manuel Penschuck. 2022. Algorithms for Large-Scale Network Analysis and the NetworKit Toolkit. Springer Nature Switzerland, Cham, 3--20.
[9]
Stephen M Blackburn, Robin Garner, Chris Hoffmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, et al. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications. 169--190.
[10]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243--262.
[11]
Leo Breiman. 1996. Bagging predictors. Machine learning 24 (1996), 123--140.
[12]
David Callahan, Alan Carle, Mary W. Hall, and Ken Kennedy. 1990. Constructing the procedure call multigraph. IEEE Transactions on Software Engineering 16, 4 (1990), 483--487.
[13]
Jeffrey Dean, David Grove, and Craig Chambers. 1995. Optimization of object-oriented programs using static class hierarchy analysis. In ECOOP'95---Object-Oriented Programming, 9th European Conference, Åarhus, Denmark, August 7--11, 1995 9. Springer, 77--101.
[14]
JB Dietrich, Henrik Schole, Li Sui, and Ewan Tempero. 2017. XCorpus-an executable corpus of Java programs. (2017).
[15]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 1536--1547.
[16]
Stephen Fink and Julian Dolby. 2012. WALA-The TJ Watson Libraries for Analysis.
[17]
Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416--419.
[18]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).
[19]
Joseph Hejderup, Arie van Deursen, and Georgios Gousios. 2018. Software ecosystem call graph for dependency management. In Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results. 101--104.
[20]
Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278--282.
[21]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).
[22]
Minseok Jeon and Hakjoo Oh. 2022. Return of CFA: call-site sensitivity can be superior to object sensitivity even for object-oriented programs. Proceedings of the ACM on Programming Languages 6, POPL (2022), 1--29.
[23]
Christian Gram Kalhauge and Jens Palsberg. 2018. Sound deadlock prediction. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1--29.
[24]
Ali Khatami and Andy Zaidman. 2023. State-Of-The-Practice in Quality Assurance in Java-Based Open Source Software Development. arXiv preprint arXiv:2306.09665 (2023).
[25]
Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D Le, and Quyet Thang Huynh. 2022. AutoPruner: transformer-based call graph pruning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 520--532.
[26]
Ondrej Lhotak. 2007. Comparing call graphs. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. 37--42.
[27]
Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Scalability-first pointer analysis with self-tuning context-sensitivity. In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. 129--140.
[28]
Tim Lindholm, Frank Yellin, Gilad Bracha, Alex Buckley, and Daniel Smith. 2021. The Java Virtual Machine Specification: Java SE 17 Edition. (2021).
[29]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[30]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
[31]
Wenjie Ma, Shengyuan Yang, Tian Tan, Xiaoxing Ma, Chang Xu, and Yue Li. 2023. Context Sensitivity without Contexts: A Cut-Shortcut Approach to Fast and Precise Pointer Analysis. Proceedings of the ACM on Programming Languages 7, PLDI (2023), 539--564.
[32]
Ravi Mangal, Xin Zhang, Aditya V Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 462--473.
[33]
Dimitrios Michail, Joris Kinable, Barak Naveh, and John V. Sichi. 2020. JGraphT-A Java Library for Graph Data Structures and Algorithms. ACM Trans. Math. Softw. 46, 2, Article 16 (May 2020), 29 pages.
[34]
Amir M Mir, Mehdi Keshani, and Sebastian Proksch. 2023. On the Effect of Transitivity and Granularity on Vulnerability Propagation in the Maven Ecosystem. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 201--211.
[35]
Gail C Murphy, David Notkin, William G Griswold, and Erica S Lan. 1998. An empirical study of static call graph extractors. ACM Transactions on Software Engineering and Methodology (TOSEM) 7, 2 (1998), 158--191.
[36]
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).
[37]
Jens Palsberg and Cristina V Lopes. 2018. Njr: A normalized java resource. In Companion Proceedings for the ISSTA/ECOOP 2018 Workshops. 100--106.
[38]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037.
[39]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.
[40]
Serena E. Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and C'edric Dangremont. 2019. A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software. In Proceedings of the 16th International Conference on Mining Software Repositories.
[41]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485--5551.
[42]
Michael Reif. 2021. Novel Approaches to Systematically Evaluating and Constructing Call Graphs for Java Software. (2021).
[43]
Michael Reif, Florian Kübler, Michael Eichberg, Dominik Helm, and Mira Mezini. 2019. Judge: Identifying, understanding, and evaluating sources of unsoundness in call graphs. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 251--261.
[44]
Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical society 74, 2 (1953), 358--366.
[45]
Barbara G Ryder. 1979. Constructing the call graph of a program. IEEE Transactions on Software Engineering 3 (1979), 216--226.
[46]
Jason Sawin and Atanas Rountev. 2011. Assumption hierarchy for a CHA call graph construction algorithm. In 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation. IEEE, 35--44.
[47]
Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, Indira Vats, Hadi Moazen, and Federica Sarro. 2021. A survey on machine learning techniques for source code analysis. arXiv preprint arXiv:2110.09610 (2021).
[48]
Olin Grigsby Shivers. 1991. Control-flow analysis of higher-order languages or taming lambda. Carnegie Mellon University.
[49]
Yannis Smaragdakis. 2021. Doop-framework for Java pointer and taint analysis (using p/taint). Retrieved Jan 10 (2021), 2021.
[50]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.
[51]
Li Sui, Jens Dietrich, Amjed Tahir, and George Fourtounis. 2020. On the recall of static call graph construction in practice. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1049--1060.
[52]
Tian Tan, Yue Li, and Jingling Xue. 2016. Making k-object-sensitive pointer analysis more precise with still k-limiting. In Static Analysis: 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Proceedings. Springer, 489--510.
[53]
Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A curated collection of Java code for empirical studies. In 2010 Asia pacific software engineering conference. IEEE, 336--345.
[54]
Akshay Utture, Shuyang Liu, Christian Gram Kalhauge, and Jens Palsberg. 2022. Striking a balance: pruning false-positives from static call graphs. In Proceedings of the 44th International Conference on Software Engineering. 2043--2055.
[55]
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 2010. Soot: A Java bytecode optimization framework. In CASCON First Decade High Impact Papers. 214--224.
[56]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[57]
Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. 2023. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922 (2023).
[58]
Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8696--8708.
[59]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
[60]
Tao Xie and David Notkin. 2002. An empirical study of java dynamic call graph extractors. University of Washington CSE Technical Report (2002), 02--12.
[61]
Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Yongji Wang, and Jian-Guang Lou. 2022. When Neural Model Meets NL2Code: A Survey. arXiv preprint arXiv:2212.09420 (2022).
[62]
Weilei Zhang and Barbara G Ryder. 2007. Automatic construction of accurate application call graph with library call abstraction for Java. Journal of Software Maintenance and Evolution: Research and Practice 19, 4 (2007), 231--252.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
April 2024
788 pages
ISBN:9798400705878
DOI:10.1145/3643991
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. call graphs
  2. machine learning
  3. pruning
  4. software analysis
  5. empirical study

Qualifiers

  • Research-article

Conference

MSR '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 87
    Total Downloads
  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)20
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media