Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICSE43902.2021.00146acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

PyCG: Practical Call Graph Generation in Python

Published: 05 November 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are modular and incorporate dynamic features and higher-order functions.
    Despite the language's popularity, there have been very few tools aiming to generate call graphs for Python programs. Worse, these tools suffer from several effectiveness issues that limit their practicality in realistic programs. We propose a pragmatic, static approach for call graph generation in Python. We compute all assignment relations between program identifiers of functions, variables, classes, and modules through an inter-procedural analysis. Based on these assignment relations, we produce the resulting call graph by resolving all calls to potentially invoked functions. Notably, the underlying analysis is designed to be efficient and scalable, handling several Python features, such as modules, generators, function closures, and multiple inheritance.
    We have evaluated our prototype implementation, which we call PyCG, using two benchmarks: a micro-benchmark suite containing small Python programs and a set of macro-benchmarks with several popular real-world Python packages. Our results indicate that PyCG can efficiently handle thousands of lines of code in less than a second (0.38 seconds for 1k LoC on average). Further, it outperforms the state-of-the-art for Python in both precision and recall: PyCG achieves high rates of precision ~99.2%, and adequate recall ~69.9%. Finally, we demonstrate how PyCG can aid dependency impact analysis by showcasing a potential enhancement to GitHub's "security advisory" notification service using a real-world example.

    References

    [1]
    Valgrind, "Callgrind: a call-graph generating cache and branch prediction profiler," 2020. [Online]. Available: http://valgrind.org/docs/manual/cl-manual.html
    [2]
    H. Shahriar and M. Zulkernine, "Mitigating program security vulnerabilities: Approaches and challenges," ACM Comput. Surv., vol. 44, no. 3, Jun. 2012.
    [3]
    A. Feldthaus, T. Millstein, A. Møller, M. Schäfer, and F. Tip, "Tool-supported refactoring for JavaScript," in Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, ser. OOPSLA '11. New York, NY, USA: Association for Computing Machinery, 2011, pp. 119--138.
    [4]
    J. Hejderup, A. van Deursen, and G. Gousios, "Software ecosystem call graph for dependency management," in Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, ser. ICSE-NIER '18. New York, NY, USA: ACM, 2018, pp. 101--104.
    [5]
    R. Kikas, G. Gousios, M. Dumas, and D. Pfahl, "Structure and evolution of package dependency networks," in Proceedings of the 14th International Conference on Mining Software Repositories, ser. MSR '17. IEEE Press, 2017, pp. 102--112.
    [6]
    (2016) The npm blog: changes to npm's unpublish policy. [Online; accessed 26-July-2020]. [Online]. Available: https://blog.npmjs.org/post/141905368000/changes-to-npms-unpublish-policy
    [7]
    (2020) npm(1)---a JavaScript package manager. [Online; accessed 26-July-2020]. [Online]. Available: https://github.com/npm/cli
    [8]
    (2020) pip 20.0.2: The PyPA recommended tool for installing Python packages. [Online; accessed 26-July-2020]. [Online]. Available: https://pypi.org/project/pip/
    [9]
    S. H. Jensen, A. Møller, and P. Thiemann, "Type analysis for JavaScript," in International Static Analysis Symposium. Springer, 2009, pp. 238--255.
    [10]
    H. Lee, S. Won, J. Jin, J. Cho, and S. Ryu, "SAFE: Formal specification and implementation of a scalable analysis framework for ECMAScript," in FOOL 2012: 19th International Workshop on Foundations of Object-Oriented Languages. Citeseer, 2012, p. 96.
    [11]
    V. Kashyap, K. Dewey, E. A. Kuefner, J. Wagner, K. Gibbons, J. Sarracino, B. Wiedermann, and B. Hardekopf, "JSAI: A static analysis platform for JavaScript," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. New York, NY, USA: Association for Computing Machinery, 2014, pp. 121--132.
    [12]
    Y. Ko, H. Lee, J. Dolby, and S. Ryu, "Practically tunable static analysis framework for large-scale JavaScript applications," in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE '15. IEEE Press, 2015, pp. 541--551.
    [13]
    M. Madsen, B. Livshits, and M. Fanning, "Practical static analysis of javascript applications in the presence of frameworks and libraries," in Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2013. New York, NY, USA: Association for Computing Machinery, 2013, pp. 499--509.
    [14]
    A. Feldthaus, M. Schäfer, M. Sridharan, J. Dolby, and F. Tip, "Efficient construction of approximate call graphs for JavaScript IDE services," in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE '13. IEEE Press, 2013, pp. 752--761.
    [15]
    T. Sotiropoulos and B. Livshits, "Static analysis for asynchronous JavaScript programs," in 33rd European Conference on Object-Oriented Programming (ECOOP 2019), ser. Leibniz International Proceedings in Informatics (LIPIcs), A. F. Donaldson, Ed., vol. 134. Dagstuhl, Germany: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019, pp. 8:1--8:30. [Online]. Available: http://drops.dagstuhl.de/opus/volltexte/2019/10800
    [16]
    M. Madsen, F. Tip, and O. Lhoták, "Static analysis of event-driven node.js JavaScript applications," SIGPLAN Not., vol. 50, no. 10, pp. 505--519, Oct. 2015.
    [17]
    GitHub, "The state of the octoverse," https://octoverse.github.com/, 2019, [Online; accessed 09-January-2020].
    [18]
    D. Fraser, E. Horner, J. Jeronen, and P. Massot, "Pyan3: Offline call graph generator for Python 3," https://github.com/davidfraser/pyan, 2018, [Online; accessed 09-January-2020].
    [19]
    G. Gharibi, R. Tripathi, and Y. Lee, "Code2graph: Automatic generation of static call graphs for Python source code," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE 2018. New York, NY, USA: Association for Computing Machinery, 2018, pp. 880--883.
    [20]
    G. Gharibi, R. Alanazi, and Y. Lee, "Automatic hierarchical clustering of static call graphs for program comprehension," in IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018. IEEE, 2018, pp. 4016--4025.
    [21]
    G. Zhang and J. Wuxia, "Depends is a fast, comprehensive code dependency analysis tool," https://github.com/multilang-depends/depends, 2018, [Online; accessed 04-August-2020].
    [22]
    N. Milojkovic, M. Ghafari, and O. Nierstrasz, "It's duck (typing) season!" in 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), May 2017, pp. 312--315.
    [23]
    M. Felleisen, R. B. Findler, and M. Flatt, Semantics engineering with PLT Redex. Mit Press, 2009.
    [24]
    M. Madsen, O. Lhoták, and F. Tip, "A model for reasoning about JavaScript promises," Proc. ACM Program. Lang., vol. 1, no. OOPSLA, Oct. 2017. [Online]. Available: https://doi.org/10.1145/3133910
    [25]
    S. Guarnieri and B. Livshits, "GATEKEEPER: Mostly static enforcement of security and reliability policies for JavaScript code," in Proceedings of the 18th Conference on USENIX Security Symposium, ser. SSYM'09. USA: USENIX Association, 2009, pp. 151--168.
    [26]
    C.-A. Staicu, M. Pradel, and B. Livshits, "SYNODE: Understanding and automatically preventing injection attacks on Node. js." in NDSS, 2018.
    [27]
    (2020) symtable. [Online; accessed 20-July-2020]. [Online]. Available: https://docs.python.org/3/library/symtable.html
    [28]
    (2020) AST in Python. [Online; accessed 20-July-2020]. [Online]. Available: https://docs.python.org/3/library/ast.html
    [29]
    M. Reif, F. Kübler, M. Eichberg, D. Helm, and M. Mezini, "Judge: Identifying, understanding, and evaluating sources of unsoundness in call graphs," in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY, USA: Association for Computing Machinery, 2019, pp. 251--261.
    [30]
    A. Rahman, C. Parnin, and L. Williams, "The seven sins: Security smells in infrastructure as code scripts," in Proceedings of the 41st International Conference on Software Engineering, ser. ICSE '19. IEEE Press, 2019, pp. 164--175. [Online]. Available: https://doi.org/10.1109/ICSE.2019.00033
    [31]
    (2020) GitHub advisory database. [Online; accessed 20-July-2020]. [Online]. Available: https://github.com/advisories
    [32]
    (2020) PyYAML: The next generation YAML parser and emitter for Python. [Online; accessed 20-July-2020]. [Online]. Available: https://github.com/yaml/pyyaml/
    [33]
    (2017) CVE-2017-18342. [Online; accessed 20-July-2020]. [Online]. Available: https://nvd.nist.gov/vuln/detail/CVE-2017-18342
    [34]
    (2020) Paramiko: The leading native Python SSHv2 protocol library. [Online; accessed 20-July-2020]. [Online]. Available: https://github.com/paramiko/paramiko/
    [35]
    (2018) CVE-2018-7750. [Online; accessed 20-July-2020]. [Online]. Available: https://nvd.nist.gov/vuln/detail/CVE-2018-7750
    [36]
    T. Xie and D. Notkin, "An empirical study of Java dynamic call graph extractors," University of Washington CSE Technical Report 02-12, vol. 3, 2002.
    [37]
    G. C. Murphy, D. Notkin, W. G. Griswold, and E. S. Lan, "An empirical study of static call graph extractors," ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 7, no. 2, pp. 158--191, 1998.
    [38]
    T. Eisenbarth, R. Koschke, and D. Simon, "Aiding program comprehension by static and dynamic feature analysis," in Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01). IEEE Computer Society, 2001, p. 602.
    [39]
    N. Grech, G. Fourtounis, A. Francalanza, and Y. Smaragdakis, "Heaps don't lie: Countering unsoundness with heap snapshots," Proc. ACM Program. Lang., vol. 1, no. OOPSLA, Oct. 2017.
    [40]
    J. Liu, Y. Li, T. Tan, and J. Xue, "Reflection analysis for Java: Uncovering more reflective targets precisely," in 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2017, pp. 12--23.
    [41]
    M. Bravenboer and Y. Smaragdakis, "Strictly declarative specification of sophisticated points-to analyses," in ACM SIGPLAN Notices, vol. 44, no. 10. ACM, 2009, pp. 243--262.
    [42]
    S. Fink and J. Dolby, "WALA---the T.J. Watson libraries for analysis," 2012.
    [43]
    O. Lhoták and L. Hendren, "Evaluating the benefits of context-sensitive points-to analysis using a BDD-based implementation," ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 18, no. 1, p. 3, 2008.
    [44]
    M. Berndl, O. Lhoták, F. Qian, L. Hendren, and N. Umanee, "Points-to analysis using BDDs," SIGPLAN Not., vol. 38, no. 5, pp. 103--114, May 2003.
    [45]
    M. Eichberg, F. Kübler, D. Helm, M. Reif, G. Salvaneschi, and M. Mezini, "Lattice based modularization of static analyses," in Companion Proceedings for the ISSTA/ECOOP 2018 Workshops, ser. ISSTA '18. New York, NY, USA: Association for Computing Machinery, 2018, pp. 113--118.
    [46]
    K. Ali and O. Lhoták, "Application-only call graph construction," in Proceedings of the 26th European Conference on Object-Oriented Programming, ser. ECOOP'12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 688--712.
    [47]
    K. Ali, X. Lai, Z. Luo, O. Lhoták, J. Dolby, and F. Tip, "A study of call graph construction for JVM-hosted languages," IEEE Transactions on Software Engineering, pp. 1--1, 2019.
    [48]
    R. Vallée-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan, "Soot: A Java bytecode optimization framework," in CASCON First Decade High Impact Papers, ser. CASCON '10. USA: IBM Corp., 2010, pp. 214--224.
    [49]
    O. Lhoták and L. Hendren, "Scaling Java points-to analysis using SPARK," in International Conference on Compiler Construction. Springer, 2003, pp. 153--169.
    [50]
    GitHub user gak, "pycallgraph is a Python module that creates call graphs for Python programs." https://github.com/gak/pycallgraph, 2014, [Online; accessed 09-January-2020].
    [51]
    G. Gessner, "npm call graph," https://www.npmjs.com/package/callgraph, 2019, [Online; accessed 09-January-2020].
    [52]
    M. Bolin, Closure: The Definitive Guide: Google Tools to Add Power to Your JavaScript. "O'Reilly Media, Inc.", 2010.
    [53]
    L. Sui, J. Dietrich, M. Emery, S. Rasheed, and A. Tahir, "On the soundness of call graph construction in the presence of dynamic language features---a benchmark and tool evaluation," in Asian Symposium on Programming Languages and Systems. Springer, 2018, pp. 69--88.
    [54]
    M. Madsen, F. Tip, and O. Lhoták, "Static analysis of event-driven Node.js JavaScript applications," in Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ser. OOPSLA 2015. New York, NY, USA: Association for Computing Machinery, 2015, pp. 505--519.
    [55]
    S. Bae, H. Cho, I. Lim, and S. Ryu, "SAFEWAPI: Web API misuse detector for web applications," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. New York, NY, USA: Association for Computing Machinery, 2014, pp. 507--517.
    [56]
    C. Park, S. Won, J. Jin, and S. Ryu, "Static analysis of JavaScript web applications in the wild via practical DOM modeling," in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE '15. IEEE Press, 2015, pp. 552--562.
    [57]
    A. Fromherz, A. Ouadjaout, and A. Miné, "Static value analysis of Python programs by abstract interpretation," in NASA Formal Methods Symposium. Springer, 2018, pp. 185--202.

    Cited By

    View all
    • (2024)Bloat beneath Python’s Scales: A Fine-Grained Inter-Project Dependency AnalysisProceedings of the ACM on Software Engineering10.1145/36608211:FSE(2584-2607)Online publication date: 12-Jul-2024
    • (2024)The Emergence of Large Language Models in Static Analysis: A First Look through Micro-BenchmarksProceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652288(35-39)Online publication date: 14-Apr-2024
    • (2024)DyPyBench: A Benchmark of Executable Python SoftwareProceedings of the ACM on Software Engineering10.1145/36437421:FSE(338-358)Online publication date: 12-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '21: Proceedings of the 43rd International Conference on Software Engineering
    May 2021
    1768 pages
    ISBN:9781450390859

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 05 November 2021

    Check for updates

    Badges

    Author Tags

    1. Call Graph
    2. Inter-procedural Analysis
    3. Program Analysis
    4. Vulnerability Propagation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICSE '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)68
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bloat beneath Python’s Scales: A Fine-Grained Inter-Project Dependency AnalysisProceedings of the ACM on Software Engineering10.1145/36608211:FSE(2584-2607)Online publication date: 12-Jul-2024
    • (2024)The Emergence of Large Language Models in Static Analysis: A First Look through Micro-BenchmarksProceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering10.1145/3650105.3652288(35-39)Online publication date: 14-Apr-2024
    • (2024)DyPyBench: A Benchmark of Executable Python SoftwareProceedings of the ACM on Software Engineering10.1145/36437421:FSE(338-358)Online publication date: 12-Jul-2024
    • (2024)TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference ToolsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3640033(49-53)Online publication date: 14-Apr-2024
    • (2024)PyAnalyzer: An Effective and Practical Approach for Dependency Extraction from Python CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3640325(1-12)Online publication date: 20-May-2024
    • (2024)An empirical study of fault localization in Python programsEmpirical Software Engineering10.1007/s10664-024-10475-329:4Online publication date: 13-Jun-2024
    • (2023)FreePart: Hardening Data Processing Software via Framework-based Partitioning and IsolationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624760(169-188)Online publication date: 25-Mar-2023
    • (2023)A Cocktail Approach to Practical Call Graph ConstructionProceedings of the ACM on Programming Languages10.1145/36228337:OOPSLA2(1001-1033)Online publication date: 16-Oct-2023
    • (2023)Automatically Resolving Dependency-Conflict Building Failures via Behavior-Consistent Loosening of Library Version ConstraintsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616264(198-210)Online publication date: 30-Nov-2023
    • (2023)That’s a Tough Call: Studying the Challenges of Call Graph Construction for WebAssemblyProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598104(892-903)Online publication date: 12-Jul-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media