Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Recursive State Machine Guided Graph Folding for Context-Free Language Reachability

Published: 06 June 2023 Publication History

Abstract

Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult.
In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges.
On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%.

Supplementary Material

Auxiliary Archive (pldi23main-p98-p-archive.zip)
Supplementary material of the paper "Recursive State Machine Guided Graph Folding for Context-Free Language Reachability", including the proofs of Property 4.1 and Property 4.2 of the paper.

References

[1]
Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas Reps, and Mihalis Yannakakis. 2005. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27, 4 (2005), 786–818. https://doi.org/10.1145/1075382.1075387
[2]
Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, and P Madhusudan. 2005. On-the-fly reachability and cycle detection for recursive state machines. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 61–76. https://doi.org/10.1007/978-3-540-31980-1_5
[3]
Rajeev Alur, Salvatore La Torre, and P Madhusudan. 2006. Modular strategies for recursive game graphs. Theoretical computer science, 354, 2 (2006), 230–249. https://doi.org/10.1016/j.tcs.2005.11.017
[4]
Rajeev Alur and Parthasarathy Madhusudan. 2004. Visibly pushdown languages. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing. 202–211. https://doi.org/10.1145/1007352.1007390
[5]
Osbert Bastani, Saswat Anand, and Alex Aiken. 2015. Specification Inference Using Context-Free Language Reachability. Acm Sigplan Notices, 50, 1 (2015), 553–566. https://doi.org/10.1145/2775051.2676977
[6]
Massimo Benerecetti, Stefano Minopoli, and Adriano Peron. 2010. Analysis of timed recursive state machines. In 2010 17th International Symposium on Temporal Representation and Reasoning. 61–68. https://doi.org/10.1145/1075382.1075387
[7]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262. https://doi.org/10.1145/1639949.1640108
[8]
Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang., 2, POPL (2018), 30:1–30:30. https://doi.org/10.1145/3158118
[9]
Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Andreas Pavlogiannis, and Prateesh Goyal. 2015. Faster algorithms for algebraic path properties in recursive state machines with constant treewidth. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 97–109. https://doi.org/10.1145/2676726.2676979
[10]
Krishnendu Chatterjee and Yaron Velner. 2012. Mean-payoff pushdown games. In 2012 27th Annual IEEE Symposium on Logic in Computer Science. 195–204. https://doi.org/10.1109/LICS.2012.30
[11]
Swarat Chaudhuri. 2008. Subcubic algorithms for recursive state machines. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 159–169. https://doi.org/10.1145/1328897.1328460
[12]
Manuel Fähndrich, Jeffrey S Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination in inclusion constraint graphs. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. 85–96. https://doi.org/10.1145/277652.277667
[13]
Olivier Gauwin, Anca Muscholl, and Michael Raskin. 2019. Minimization of visibly pushdown automata is NP-complete. arXiv preprint arXiv:1907.09563, https://doi.org/10.48550/arXiv.1907.09563
[14]
Tang Hao, Xiaoyin Wang, Lingming Zhang, Xie Bing, Zhang Lu, and Mei Hong. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Acm Sigplan-sigact Symposium on Principles of Programming Languages. https://doi.org/10.1145/2676726.2676997
[15]
Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 290–299. https://doi.org/10.1145/1273442.1250767
[16]
Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location equivalence to optimize pointer analysis. In International Static Analysis Symposium. 265–280. https://doi.org/10.1007/978-3-540-74061-2_17
[17]
David L. Heine and Monica S. Lam. 2003. A practical flow-sensitive and context-sensitive C and C++ memory leak detector. 168. https://doi.org/10.1145/780822.781150
[18]
Matthias Heizmann, Christian Schilling, and Daniel Tischner. 2017. Minimization of visibly pushdown automata using partial Max-SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 461–478. https://doi.org/10.48550/arXiv.1701.05160
[19]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430. https://doi.org/10.1007/978-3-319-41540-6_23
[20]
John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. ACM Sigplan Notices, 39, 6 (2004), 207–218. https://doi.org/10.1145/996893.996867
[21]
François Le Gall. 2014. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th international symposium on symbolic and algebraic computation. 296–303. https://doi.org/10.1145/2608628.2608664
[22]
Yuxiang Lei, Shin Hwei Sui, Tan, and Qirun Zhang. 2023. Artifact of “Recursive State Machine Guided Graph Folding for Context-Free Language Reachability”. https://doi.org/10.5281/zenodo.7787371
[23]
Yuxiang Lei and Yulei Sui. 2019. Fast and precise handling of positive weight cycles for field-sensitive pointer analysis. In International Static Analysis Symposium. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3
[24]
Yuxiang Lei, Yulei Sui, Shuo Ding, and Qirun Zhang. 2022. Taming transitive redundancy for context-free language reachability. Proceedings of the ACM on Programming Languages, 6, OOPSLA2 (2022), 1556–1582. https://doi.org/10.1145/3563343
[25]
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In PLDI ’20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation. https://doi.org/10.1145/3385412.3386021
[26]
David Melski and Thomas Reps. 2000. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248, 1-2 (2000), 29–98. https://doi.org/10.1145/258994.259006
[27]
Nomair A Naeem and Ondrej Lhoták. 2008. Typestate-like analysis of multiple interacting objects. ACM Sigplan Notices, 43, 10 (2008), 347–366. https://doi.org/10.1145/1449764.1449792
[28]
Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett., 49, 1 (1994), 9–14. https://doi.org/10.1016/0020-0190(94)90047-7
[29]
David J Pearce, Paul HJ Kelly, and Chris Hankin. 2007. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems (TOPLAS), 30, 1 (2007), 4–es. https://doi.org/10.1145/1290520.1290524
[30]
Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In 2009 International Symposium on Code Generation and Optimization. 126–135. https://doi.org/10.1109/CGO.2009.9
[31]
Jakob Rehof and Manuel Fähndrich. 2001. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. ACM SIGPLAN Notices, 36, 3 (2001), 54–66. https://doi.org/10.1145/373243.360208
[32]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 49–61. https://doi.org/10.1145/199448.199462
[33]
Thomas Reps, Akash Lal, and Nick Kidd. 2007. Program analysis using weighted pushdown systems. In International Conference on Foundations of Software Technology and Theoretical Computer Science. 23–51. https://doi.org/10.1007/978-3-540-77050-3_4
[34]
Thomas Reps, Stefan Schwoon, Somesh Jha, and David Melski. 2005. Weighted pushdown systems and their application to interprocedural dataflow analysis. Science of Computer Programming, 58, 1-2 (2005), 206–263. https://doi.org/10.1016/j.scico.2005.02.009
[35]
Thomas W. Reps. 1998. Program analysis via graph reachability. Information & Software Technology, 40, 11-12 (1998), 701–726. https://doi.org/10.1016/S0950-5849(98)00093-7
[36]
Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution for scaling points-to analysis. Acm Sigplan Notices, 35, 5 (2000), 47–56. https://doi.org/10.1145/349299.349310
[37]
Wojciech Rytter. 1983. Time complexity of loop-free two-way pushdown automata. Inform. Process. Lett., 16, 3 (1983), 127–129. https://doi.org/10.1016/0020-0190(83)90063-7
[38]
Johannes Späth, Karim Ali, and Eric Bodden. 2019. Context-, flow-, and field-sensitive data-flow analysis using synchronized Pushdown systems. Proc. ACM Program. Lang., 3, POPL (2019), 48:1–48:29. https://doi.org/10.1145/3291641
[39]
Yulei Sui, Xiao Cheng, Guanqin Zhang, and Haoyu Wang. 2020. Flow2Vec: value-flow-based precise code embedding. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–27. https://doi.org/10.1145/3428301
[40]
Yulei Sui and Jingling Xue. 2016. On-demand strong update analysis via value-flow refinement. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 460–473. https://doi.org/10.1145/2950290.2950296
[41]
Yulei Sui and Jingling Xue. 2016. SVF: interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th international conference on compiler construction. 265–266. https://doi.org/10.1145/2892208.2892235
[42]
Yulei Sui and Jingling Xue. 2018. Value-flow-based demand-driven pointer analysis for C and C++. IEEE Transactions on Software Engineering, 46, 8 (2018), 812–835. https://doi.org/10.48550/arXiv.1701.05650
[43]
Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40, 2 (2014), 107–122. https://doi.org/10.1145/2338965.2336784
[44]
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160. https://doi.org/10.1137/0201010
[45]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45, 1 (2017), 389–404. https://doi.org/10.1145/3093336.3037744
[46]
Virginia Vassilevska Williams and R. Ryan Williams. 2018. Subcubic Equivalences Between Path, Matrix, and Triangle Problems. J. ACM, 65, 5 (2018), 27:1–27:38. https://doi.org/10.1145/3186893
[47]
Wojciech and Rytter. 1985. Fast recognition of pushdown automaton and context-free languages. Information and Control, https://doi.org/10.1016/S0019-9958(85)80024-3
[48]
Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In European Conference on Object-Oriented Programming. 98–122. https://doi.org/10.1007/978-3-642-03013-0_6
[49]
Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. 155–165. https://doi.org/10.1145/2001420.2001440
[50]
Qirun Zhang, Michael R Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 435–446. https://doi.org/10.1145/2491956.2462159
[51]
Qirun Zhang and Zhendong Su. 2017. Context-sensitive data-dependence analysis via linear conjunctive language reachability. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. 344–358. https://doi.org/10.1145/3093333.3009848
[52]
Qirun Zhang, Xiao Xiao, Charles Zhang, Hao Yuan, and Zhendong Su. 2014. Efficient subcubic alias analysis for C. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. 829–845. https://doi.org/10.1145/2660193.2660213
[53]
Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 197–208. https://doi.org/10.1145/1328897.1328464

Cited By

View all
  • (2024)Iterative-Epoch Online Cycle Elimination for Context-Free Language ReachabilityProceedings of the ACM on Programming Languages10.1145/36498628:OOPSLA1(1437-1462)Online publication date: 29-Apr-2024
  • (2024)Dynamic Transitive Closure-based Static Analysis through the Lens of Quantum SearchACM Transactions on Software Engineering and Methodology10.1145/364438933:5(1-29)Online publication date: 4-Jun-2024
  • (2024)Fast Graph Simplification for Path-Sensitive Typestate Analysis through Tempo-Spatial Multi-Point SlicingProceedings of the ACM on Software Engineering10.1145/36437491:FSE(494-516)Online publication date: 12-Jul-2024
  • Show More Cited By

Index Terms

  1. Recursive State Machine Guided Graph Folding for Context-Free Language Reachability

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 7, Issue PLDI
    June 2023
    2020 pages
    EISSN:2475-1421
    DOI:10.1145/3554310
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2023
    Published in PACMPL Volume 7, Issue PLDI

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. CFL-reachability
    2. graph simplification
    3. recursive state machines

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)382
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Iterative-Epoch Online Cycle Elimination for Context-Free Language ReachabilityProceedings of the ACM on Programming Languages10.1145/36498628:OOPSLA1(1437-1462)Online publication date: 29-Apr-2024
    • (2024)Dynamic Transitive Closure-based Static Analysis through the Lens of Quantum SearchACM Transactions on Software Engineering and Methodology10.1145/364438933:5(1-29)Online publication date: 4-Jun-2024
    • (2024)Fast Graph Simplification for Path-Sensitive Typestate Analysis through Tempo-Spatial Multi-Point SlicingProceedings of the ACM on Software Engineering10.1145/36437491:FSE(494-516)Online publication date: 12-Jul-2024
    • (2023)Two Birds with One Stone: Multi-Derivation for Fast Context-Free Language Reachability Analysis2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00118(624-636)Online publication date: 11-Sep-2023
    • (2023)Vulnerability Detection via Typestate-Guided Code Representation LearningFormal Methods and Software Engineering10.1007/978-981-99-7584-6_22(291-297)Online publication date: 21-Nov-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media