Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Taming transitive redundancy for context-free language reachability

Published: 31 October 2022 Publication History

Abstract

Given an edge-labeled graph, context-free language reachability (CFL-reachability) computes reachable node pairs by deriving new edges and adding them to the graph. The redundancy that limits the scalability of CFL-reachability manifests as redundant derivations, i.e., identical edges can be derived multiple times due to the many paths between two reachable nodes. We observe that most redundancy arises from the derivations involving transitive relations of reachable node pairs. Unfortunately, existing techniques for reducing redundancy in transitive-closure-based problems are either ineffective or inapplicable to identifying and eliminating redundant derivations during on-the-fly CFL-reachability solving.
This paper proposes a scalable yet precision-preserving approach to all-pairs CFL-reachability analysis by taming its transitive redundancy. Our key insight is that transitive relations are intrinsically ordered, and utilizing the order for edge derivation can avoid most redundancy. To address the challenges in determining the derivation order from the dynamically changed graph during CFL-reachability solving, we introduce a hybrid graph representation by combining spanning trees and adjacency lists, together with a dynamic construction algorithm. Based on this representation, we propose a fast and effective partially ordered algorithm POCR to boost the performance of CFL-reachability analysis by reducing its transitive redundancy during on-the-fly solving. Our experiments on context-sensitive value-flow analysis and field-sensitive alias analysis for C/C++ demonstrate the promising performance of POCR. On average, POCR eliminates 98.50% and 97.26% redundant derivations respectively for the value-flow and alias analysis, achieving speedups of 21.48× and 19.57× over the standard CFL-reachability algorithm. We also compare POCR with two recent open-source tools, Graspan (a CFL-reachability solver) and Soufflé (a Datalog engine). The results demonstrate that POCR is over 3.67× faster than Graspan and Soufflé on average for both value-flow analysis and alias analysis.

References

[1]
Alfred V. Aho, Michael R Garey, and Jeffrey D. Ullman. 1972. The transitive reduction of a directed graph. SIAM J. Comput., 1, 2 (1972), 131–137. https://doi.org/10.1137/0201008
[2]
Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas Reps, and Mihalis Yannakakis. 2005. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27, 4 (2005), 786–818. https://doi.org/10.1007/3-540-44585-4_18
[3]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262. https://doi.org/10.1145/1640089.1640108
[4]
Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang., 2, POPL (2018), 30:1–30:30. https://doi.org/10.48550/arXiv.1910.00241
[5]
Swarat Chaudhuri. 2008. Subcubic algorithms for recursive state machines. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 159–169. https://doi.org/10.1145/1328897.1328460
[6]
Manuel Fähndrich, Jeffrey S Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination in inclusion constraint graphs. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. 85–96. https://doi.org/10.1145/277652.277667
[7]
Olivier Gauwin, Anca Muscholl, and Michael Raskin. 2019. Minimization of visibly pushdown automata is NP-complete. arXiv preprint arXiv:1907.09563, https://doi.org/10.48550/arXiv.1907.09563
[8]
Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 290–299. https://doi.org/10.1145/1273442.1250767
[9]
Matthias Heizmann, Christian Schilling, and Daniel Tischner. 2017. Minimization of visibly pushdown automata using partial Max-SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 461–478. https://doi.org/10.1007/978-3-662-54577-5_27
[10]
Harry T Hsu. 1975. An algorithm for finding a minimal equivalent graph of a digraph. Journal of the ACM (JACM), 22, 1 (1975), 11–16. https://doi.org/10.1145/321864.321866
[11]
Giuseppe F. Italiano. 1986. Amortized efficiency of a path retrieval data structure. Theoretical Computer Science, 48 (1986), 273–281. https://doi.org/10.1016/0304-3975(86)90098-8
[12]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé. https://github.com/souffle-lang/souffle
[13]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430. https://doi.org/10.1007/978-3-319-41540-6_23
[14]
John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. ACM Sigplan Notices, 39, 6 (2004), 207–218. https://doi.org/10.1145/996893.996867
[15]
Yuxiang Lei and Yulei Sui. 2019. Fast and precise handling of positive weight cycles for field-sensitive pointer analysis. In International Static Analysis Symposium. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3
[16]
Yuxiang Lei, Yulei Sui, Shuo Ding, and Qirun Zhang. 2022. Artifact of “Taming Transitive Redundancy for Context-Free Language Reachability”. https://doi.org/10.5281/zenodo.7066401
[17]
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 780–793. https://doi.org/10.1145/3492428
[18]
David Melski and Thomas Reps. 2000. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248, 1-2 (2000), 29–98. https://doi.org/10.1016/S0304-3975(00)00049-9
[19]
Dennis M Moyles and Gerald L Thompson. 1969. An algorithm for finding a minimum equivalent graph of a digraph. Journal of the ACM (JACM), 16, 3 (1969), 455–460. https://doi.org/10.1145/321526.321534
[20]
Nomair A Naeem and Ondrej Lhoták. 2008. Typestate-like analysis of multiple interacting objects. ACM Sigplan Notices, 43, 10 (2008), 347–366. https://doi.org/10.1145/1449955.1449792
[21]
Patrick Nappa, David Zhao, Pavle Subotić, and Bernhard Scholz. 2019. Fast parallel equivalence relations in a datalog compiler. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 82–96. https://doi.org/10.1109/PACT.2019.00015
[22]
Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett., 49, 1 (1994), 9–14. https://doi.org/10.1016/0020-0190(94)90047-7
[23]
Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In 2009 International Symposium on Code Generation and Optimization. 126–135. https://doi.org/10.1109/CGO.2009.9
[24]
Jakob Rehof and Manuel Fähndrich. 2001. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. ACM SIGPLAN Notices, 36, 3 (2001), 54–66. https://doi.org/10.1145/360204.360208
[25]
Thomas Reps. 1995. Shape analysis as a generalized path problem. In Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation. 1–11. https://doi.org/10.1145/215465.215466
[26]
Thomas Reps. 2000. Undecidability of context-sensitive data-dependence analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 22, 1 (2000), 162–186. https://doi.org/10.1145/345099.345137
[27]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 49–61. https://doi.org/10.1145/199448.199462
[28]
Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution for scaling points-to analysis. Acm Sigplan Notices, 35, 5 (2000), 47–56. https://doi.org/10.1145/358438.349310
[29]
Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with CFL-reachability. In 2014 43rd International Conference on Parallel Processing. 451–460. https://doi.org/10.1109/ICPP.2014.54
[30]
Zhendong Su, Manuel Fähndrich, and Alexander Aiken. 2000. Projection merging: Reducing redundancies in inclusion constraint graphs. In Proceedings of the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 81–95. https://doi.org/10.1145/325694.325706
[31]
Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-Flow Analysis in LLVM. In CC ’16. 265–266. https://doi.org/10.1145/2892208.2892235
[32]
Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40, 2 (2014), 107–122. https://doi.org/10.1109/TSE.2014.2302311
[33]
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160. https://doi.org/10.1109/SWAT.1971.10
[34]
Haijun Wang, Xiaofei Xie, Yi Li, Cheng Wen, Yuekang Li, Yang Liu, Shengchao Qin, Hongxu Chen, and Yulei Sui. 2020. Typestate-Guided Fuzzer for Discovering Use-after-Free Vulnerabilities. In 42nd International Conference on Software Engineering. https://doi.org/10.1145/3377811.3380386
[35]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45, 1 (2017), 389–404. https://doi.org/10.1145/3093336.3037744
[36]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2020. Graspan-cpp. https://github.com/Graspan/graspan-cpp
[37]
Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In European Conference on Object-Oriented Programming. 98–122. https://doi.org/10.1007/978-3-642-03013-0_6
[38]
Hao Yuan and Patrick Eugster. 2009. An efficient algorithm for solving the dyck-cfl reachability problem on trees. In European Symposium on Programming. 175–189. https://doi.org/10.1007/978-3-642-00590-9_13
[39]
Qirun Zhang, Michael R Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 435–446. https://doi.org/10.1145/2491956.2462159
[40]
Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 197–208. https://doi.org/10.1145/1328897.1328464

Cited By

View all
  • (2024)Context-Free Language Reachability via Skewed TabulationProceedings of the ACM on Programming Languages10.1145/36564518:PLDI(1830-1853)Online publication date: 20-Jun-2024
  • (2024)Iterative-Epoch Online Cycle Elimination for Context-Free Language ReachabilityProceedings of the ACM on Programming Languages10.1145/36498628:OOPSLA1(1437-1462)Online publication date: 29-Apr-2024
  • (2024)Dynamic Transitive Closure-based Static Analysis through the Lens of Quantum SearchACM Transactions on Software Engineering and Methodology10.1145/364438933:5(1-29)Online publication date: 4-Jun-2024
  • Show More Cited By

Index Terms

  1. Taming transitive redundancy for context-free language reachability

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 6, Issue OOPSLA2
    October 2022
    1932 pages
    EISSN:2475-1421
    DOI:10.1145/3554307
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 October 2022
    Published in PACMPL Volume 6, Issue OOPSLA2

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. CFL-reachability
    2. performance
    3. redundancy
    4. transitive relation

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)281
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Context-Free Language Reachability via Skewed TabulationProceedings of the ACM on Programming Languages10.1145/36564518:PLDI(1830-1853)Online publication date: 20-Jun-2024
    • (2024)Iterative-Epoch Online Cycle Elimination for Context-Free Language ReachabilityProceedings of the ACM on Programming Languages10.1145/36498628:OOPSLA1(1437-1462)Online publication date: 29-Apr-2024
    • (2024)Dynamic Transitive Closure-based Static Analysis through the Lens of Quantum SearchACM Transactions on Software Engineering and Methodology10.1145/364438933:5(1-29)Online publication date: 4-Jun-2024
    • (2023)Recursive State Machine Guided Graph Folding for Context-Free Language ReachabilityProceedings of the ACM on Programming Languages10.1145/35912337:PLDI(318-342)Online publication date: 6-Jun-2023
    • (2023)Two Birds with One Stone: Multi-Derivation for Fast Context-Free Language Reachability Analysis2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00118(624-636)Online publication date: 11-Sep-2023
    • (2023)Vulnerability Detection via Typestate-Guided Code Representation LearningFormal Methods and Software Engineering10.1007/978-981-99-7584-6_22(291-297)Online publication date: 21-Nov-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media