Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

On-the-Fly Static Analysis via Dynamic Bidirected Dyck Reachability

Published: 05 January 2024 Publication History

Abstract

Dyck reachability is a principled, graph-based formulation of a plethora of static analyses. Bidirected graphs are used for capturing dataflow through mutable heap data, and are usual formalisms of demand-driven points-to and alias analyses. The best (offline) algorithm runs in O(m+n· α(n)) time, where n is the number of nodes and m is the number of edges in the flow graph, which becomes O(n2) in the worst case.
In the everyday practice of program analysis, the analyzed code is subject to continuous change, with source code being added and removed. On-the-fly static analysis under such continuous updates gives rise to dynamic Dyck reachability, where reachability queries run on a dynamically changing graph, following program updates. Naturally, executing the offline algorithm in this online setting is inadequate, as the time required to process a single update is prohibitively large.
In this work we develop a novel dynamic algorithm for bidirected Dyck reachability that has O(n· α(n)) worst-case performance per update, thus beating the O(n2) bound, and is also optimal in certain settings. We also implement our algorithm and evaluate its performance on on-the-fly data-dependence and alias analyses, and compare it with two best known alternatives, namely (i) the optimal offline algorithm, and (ii) a fully dynamic Datalog solver. Our experiments show that our dynamic algorithm is consistently, and by far, the top performing algorithm, exhibiting speedups in the order of 1000X. The running time of each update is almost always unnoticeable to the human eye, making it ideal for the on-the-fly analysis setting.

References

[1]
2003. T. J. Watson Libraries for Analysis (WALA). https://github.com.
[2]
2008. SPECjvm2008 Benchmark Suit. http://www.spec.org/jvm2008/.
[3]
Robert S. Arnold. 1996. Software Change Impact Analysis. IEEE Computer Society Press, Los Alamitos, CA, USA. isbn:0818673842
[4]
Steven Arzt and Eric Bodden. 2014. Reviser: Efficiently Updating IDE-/IFDS-Based Data-Flow Analyses in Response to Incremental Program Changes. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). Association for Computing Machinery, New York, NY, USA. 288–298. isbn:9781450327565 https://doi.org/10.1145/2568225.2568243
[5]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications. ACM Press, New York, NY, USA. 169–190. https://doi.org/10.1145/1167473.1167488
[6]
Eric Bodden. 2012. Inter-procedural Data-flow Analysis with IFDS/IDE and Soot. In SOAP. ACM, New York, NY, USA.
[7]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. SIGPLAN Not., 44, 10 (2009), oct, 243–262. issn:0362-1340 https://doi.org/10.1145/1639949.1640108
[8]
M.G. Burke and B.G. Ryder. 1990. A critical analysis of incremental iterative data flow analysis algorithms. IEEE Transactions on Software Engineering, 16, 7 (1990), 723–728. https://doi.org/10.1109/32.56098
[9]
Jakob Cetti Hansen, Adam Husted Kjelstrøm, and Andreas Pavlogiannis. 2021. Tight bounds for reachability problems on one-counter and pushdown systems. Inform. Process. Lett., 171 (2021), 106135. issn:0020-0190 https://doi.org/10.1016/j.ipl.2021.106135
[10]
Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck Reachability for Data-Dependence and Alias Analysis. Proc. ACM Program. Lang., 2, POPL (2018), Article 30, Dec., 30 pages.
[11]
Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016. Algorithms for Algebraic Path Properties in Concurrent Systems of Constant Treewidth Components. SIGPLAN Not., 51, 1 (2016), jan, 733–747. issn:0362-1340 https://doi.org/10.1145/2914770.2837624
[12]
Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2020. Optimal and Perfectly Parallel Algorithms for On-demand Data-Flow Analysis. In Programming Languages and Systems, Peter Müller (Ed.). Springer International Publishing, Cham. 112–140. isbn:978-3-030-44914-8
[13]
Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Andreas Pavlogiannis, and Prateesh Goyal. 2015. Faster Algorithms for Algebraic Path Properties in Recursive State Machines with Constant Treewidth. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). Association for Computing Machinery, New York, NY, USA. 97–109. isbn:9781450333009 https://doi.org/10.1145/2676726.2676979
[14]
Swarat Chaudhuri. 2008. Subcubic Algorithms for Recursive State Machines. SIGPLAN Not., 43, 1 (2008), Jan., 159–169. issn:0362-1340 https://doi.org/10.1145/1328897.1328460
[15]
Dmitry Chistikov, Rupak Majumdar, and Philipp Schepper. 2022. Subcubic Certificates for CFL Reachability. Proc. ACM Program. Lang., 6, POPL (2022), Article 41, jan, 29 pages. https://doi.org/10.1145/3498702
[16]
David Eppstein, Zvi Galil, Giuseppe F. Italiano, and Amnon Nissenzweig. 1997. Sparsification—a Technique for Speeding up Dynamic Graph Algorithms. J. ACM, 44, 5 (1997), sep, 669–696. issn:0004-5411 https://doi.org/10.1145/265910.265914
[17]
Moses Ganardi, Rupak Majumdar, Andreas Pavlogiannis, Lia Schütze, and Georg Zetzsche. 2022. Reachability in Bidirected Pushdown VASS. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), Mikoł aj Bojańczyk, Emanuela Merelli, and David P. Woodruff (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 229). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. 124:1–124:20. isbn:978-3-95977-235-8 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ICALP.2022.124
[18]
Moses Ganardi, Rupak Majumdar, and Georg Zetzsche. 2022. The Complexity of Bidirected Reachability in Valence Systems. In Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS ’22). Association for Computing Machinery, New York, NY, USA. 1–15. isbn:978-1-4503-9351-5 https://doi.org/10.1145/3531130.3533345
[19]
Amir Kafshdar Goharshady and Ahmed Khaled Zaher. 2023. Efficient Interprocedural Data-Flow Analysis Using Treedepth and Treewidth. In Verification, Model Checking, and Abstract Interpretation, Cezara Dragoi, Michael Emmi, and Jingbo Wang (Eds.). Springer Nature Switzerland, Cham. 177–202. isbn:978-3-031-24950-1
[20]
Nevin Heintze and David McAllester. 1997. On the Cubic Bottleneck in Subtyping and Flow Analysis. In Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science (LICS ’97). IEEE Computer Society, Washington, DC, USA. 342–. isbn:0-8186-7925-5 http://dl.acm.org/citation.cfm?id=788019.788876
[21]
Nevin Heintze and Olivier Tardieu. 2001. Demand-Driven Pointer Analysis. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI ’01). Association for Computing Machinery, New York, NY, USA. 24–34. isbn:1581134142 https://doi.org/10.1145/378795.378802
[22]
Fritz Henglein. 1991. Efficient Type Inference for Higher-Order Binding-Time Analysis. In Functional Programming Languages and Computer Architecture, John Hughes (Ed.) (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg. 448–472. isbn:978-3-540-47599-6 https://doi.org/10.1007/3540543961_22
[23]
Fritz Henglein. 1992. Global Tagging Optimization by Type Inference. In Proceedings of the 1992 ACM Conference on LISP and Functional Programming (LFP ’92). Association for Computing Machinery, New York, NY, USA. 205–215. isbn:978-0-89791-481-9 https://doi.org/10.1145/141471.141542
[24]
Jacob Holm, Kristian de Lichtenberg, and Mikkel Thorup. 2001. Poly-Logarithmic Deterministic Fully-Dynamic Algorithms for Connectivity, Minimum Spanning Tree, 2-Edge, and Biconnectivity. J. ACM, 48, 4 (2001), jul, 723–760. issn:0004-5411 https://doi.org/10.1145/502090.502095
[25]
Susan Horwitz, Thomas Reps, and Mooly Sagiv. 1995. Demand Interprocedural Dataflow Analysis. SIGSOFT Softw. Eng. Notes.
[26]
Wei Huang, Yao Dong, Ana Milanova, and Julian Dolby. 2015. Scalable and Precise Taint Analysis for Android. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). Association for Computing Machinery, New York, NY, USA. 106–117. isbn:9781450336208 https://doi.org/10.1145/2771783.2771803
[27]
Paris C. Kanellakis and Peter Z. Revesz. 1989. On the Relationship of Congruence Closureand Unification. Journal of Symbolic Computation, 7, 3-4 (1989), March, 427–444. issn:07477171 https://doi.org/10.1016/S0747-7171(89)80018-5
[28]
Adam Husted Kjelstrøm and Andreas Pavlogiannis. 2022. The Decidability and Complexity of Interleaved Bidirected Dyck Reachability. Proc. ACM Program. Lang., 6, POPL (2022), Article 12, jan, 26 pages. https://doi.org/10.1145/3498673
[29]
Paraschos Koutris and Shaleen Deep. 2023. The Fine-Grained Complexity of CFL Reachability. Proc. ACM Program. Lang., 7, POPL (2023), Article 59, jan, 27 pages. https://doi.org/10.1145/3571252
[30]
Shankaranarayanan Krishna, Aniket Lal, Andreas Pavlogiannis, and Omkar Tuppe. 2023. On-The-Fly Static Analysis via Dynamic Bidirected Dyck Reachability. arxiv:2311.04319.
[31]
Johannes Lerch, Johannes Späth, Eric Bodden, and Mira Mezini. 2015. Access-Path Abstraction: Scaling Field-Sensitive Data-Flow Analysis with Unbounded Access Paths. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE ’15). IEEE Press, 619–629. isbn:9781509000241 https://doi.org/10.1109/ASE.2015.9
[32]
Ondřej Lhoták and Laurie Hendren. 2006. Context-Sensitive Points-to Analysis: Is It Worth It? In Proceedings of the 15th International Conference on Compiler Construction (CC). 47–64.
[33]
Yuanbo Li, Kris Satya, and Qirun Zhang. 2022. Efficient Algorithms for Dynamic Bidirected Dyck-Reachability. Proc. ACM Program. Lang., 6, POPL (2022), Article 62, jan, 29 pages. https://doi.org/10.1145/3498724
[34]
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast Graph Simplification for Interleaved Dyck-Reachability. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 780–793. isbn:9781450376136 https://doi.org/10.1145/3385412.3386021
[35]
Bozhen Liu, Jeff Huang, and Lawrence Rauchwerger. 2019. Rethinking Incremental and Parallel Pointer Analysis. ACM Trans. Program. Lang. Syst., 41, 1 (2019), Article 6, mar, 31 pages. issn:0164-0925 https://doi.org/10.1145/3293606
[36]
Jingbo Lu and Jingling Xue. 2019. Precision-Preserving yet Fast Object-Sensitive Pointer Analysis with Partial Context Sensitivity. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 148, Oct., 29 pages. https://doi.org/10.1145/3360574
[37]
Magnus Madsen and Ondřej Lhoták. 2020. Fixpoints for the Masses: Programming with First-Class Datalog Constraints. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 125, nov, 28 pages. https://doi.org/10.1145/3428193
[38]
Anders Alnor Mathiasen and Andreas Pavlogiannis. 2021. The Fine-Grained and Parallel Complexity of Andersen’s Pointer Analysis. Proc. ACM Program. Lang., 5, POPL (2021), Article 34, Jan., 29 pages. https://doi.org/10.1145/3434315
[39]
Ana Milanova. 2020. FlowCFL: Generalized Type-Based Reachability Analysis: Graph Reduction and Equivalence of CFL-Based and Type-Based Reachability. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 178, Nov., 29 pages. https://doi.org/10.1145/3428246
[40]
Anders Møller and Michael I. Schwartzbach. 2018. Static Program Analysis. Department of Computer Science, Aarhus University. http://cs.au.dk/~amoeller/spa/
[41]
Nomair A. Naeem, Ondřej Lhoták, and Jonathan Rodriguez. 2010. Practical Extensions to the IFDS Algorithm. In Compiler Construction, Rajiv Gupta (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 124–144. isbn:978-3-642-11970-5
[42]
André Pacak, Sebastian Erdweg, and Tamás Szabó. 2020. A Systematic Approach to Deriving Incremental Type Checkers. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 127, nov, 28 pages. https://doi.org/10.1145/3428195
[43]
Andreas Pavlogiannis. 2023. CFL/Dyck Reachability: An Algorithmic Perspective. ACM SIGLOG News, 9, 4 (2023), feb, 5–25. https://doi.org/10.1145/3583660.3583664
[44]
Jakob Rehof and Manuel Fähndrich. 2001. Type-base Flow Analysis: From Polymorphic Subtyping to CFL-reachability. In Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 54–66.
[45]
Thomas Reps. 1995. Shape Analysis As a Generalized Path Problem. In Proceedings of the 1995 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM ’95). ACM, 1–11.
[46]
Thomas Reps. 1997. Program Analysis via Graph Reachability. In Proceedings of the 1997 International Symposium on Logic Programming (ILPS). 5–19.
[47]
Thomas Reps. 2000. Undecidability of Context-sensitive Data-dependence Analysis. ACM Trans. Program. Lang. Syst., 22, 1 (2000), 162–186.
[48]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In POPL. ACM, New York, NY, USA.
[49]
Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding Up Slicing. SIGSOFT Softw. Eng. Notes, 19, 5 (1994), 11–20.
[50]
Thomas W. Reps. 1995. Demand Interprocedural Program Analysis Using Logic Databases. Springer US, Boston, MA. 163–196. isbn:978-1-4615-2207-2 https://doi.org/10.1007/978-1-4615-2207-2_8
[51]
Leonid Ryzhyk and Mihai Budiu. 2019. Differential Datalog. In Datalog 2.0 2019 - 3rd International Workshop on the Resurgence of Datalog in Academia and Industry (CEUR Workshop Proceedings, Vol. 2368). 56–67. http://ceur-ws.org/Vol-2368/paper6.pdf
[52]
Lei Shang, Xinwei Xie, and Jingling Xue. 2012. On-demand Dynamic Summary-based Points-to Analysis. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO ’12). ACM, 264–274.
[53]
Johannes Späth, Karim Ali, and Eric Bodden. 2019. Context-, Flow-, and Field-Sensitive Data-Flow Analysis Using Synchronized Pushdown Systems. Proc. ACM Program. Lang., 3, POPL (2019), Article 48, Jan., 29 pages. https://doi.org/10.1145/3290361
[54]
Manu Sridharan and Rastislav Bodík. 2006. Refinement-based Context-sensitive Points-to Analysis for Java. SIGPLAN Not., 41, 6 (2006), 387–400.
[55]
Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven Points-to Analysis for Java. In OOPSLA.
[56]
Bjarne Steensgaard. 1996. Points-to Analysis in Almost Linear Time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’96). Association for Computing Machinery, New York, NY, USA. 32–41. isbn:0897917693 https://doi.org/10.1145/237721.237727
[57]
Tamás Szabó, Sebastian Erdweg, and Markus Voelter. 2016. IncA: A DSL for the Definition of Incremental Program Analyses. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE ’16). Association for Computing Machinery, New York, NY, USA. 320–331. isbn:9781450338455 https://doi.org/10.1145/2970276.2970298
[58]
Hao Tang, Di Wang, Yingfei Xiong, Lingming Zhang, Xiaoyin Wang, and Lu Zhang. 2017. Conditional Dyck-CFL Reachability Analysis for Complete and Efficient Library Summarization. In Programming Languages and Systems, Hongseok Yang (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 880–908. isbn:978-3-662-54434-1
[59]
Hao Tang, Xiaoyin Wang, Lingming Zhang, Bing Xie, Lu Zhang, and Hong Mei. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 83–95. isbn:978-1-4503-3300-9
[60]
Tom Tseng. 2020. Dynamic connectivity data structure by Holm, de Lichtenberg, and Thorup. https://github.com/tomtseng/dynamic-connectivity-hdt
[61]
Jyothi Vedurada and V. Krishna Nandivada. 2019. Batch Alias Analysis. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE ’19). IEEE Press, 936–948. isbn:9781728125084 https://doi.org/10.1109/ASE.2019.00091
[62]
Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis. In Proceedings of the 23rd European Conference on ECOOP 2009 — Object-Oriented Programming (Genoa). 98–122.
[63]
Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven Context-sensitive Alias Analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA). 155–165.
[64]
Mihalis Yannakakis. 1990. Graph-theoretic Methods in Database Theory. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 230–242.
[65]
Hao Yuan and Patrick Eugster. 2009. An Efficient Algorithm for Solving the Dyck-CFL Reachability Problem on Trees. In Proceedings of the 18th European Symposium on Programming Languages and Systems: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (ESOP). 175–189.
[66]
Frank Kenneth Zadeck. 1984. Incremental Data Flow Analysis in a Structured Program Editor. In Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction (SIGPLAN ’84). Association for Computing Machinery, New York, NY, USA. 132–143. isbn:0897911393 https://doi.org/10.1145/502874.502888
[67]
Qirun Zhang, Michael R. Lyu, Hao Yuan, and Zhendong Su. 2013. Fast Algorithms for Dyck-CFL-reachability with Applications to Alias Analysis. PLDI. ACM.
[68]
Qirun Zhang and Zhendong Su. 2017. Context-Sensitive Data-Dependence Analysis via Linear Conjunctive Language Reachability. SIGPLAN Not., 52, 1 (2017), Jan., 344–358. issn:0362-1340 https://doi.org/10.1145/3093333.3009848
[69]
Xin Zheng and Radu Rugina. 2008. Demand-driven Alias Analysis for C. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’08). ACM, 197–208.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue POPL
January 2024
2820 pages
EISSN:2475-1421
DOI:10.1145/3554315
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2024
Published in PACMPL Volume 8, Issue POPL

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. CFL reachability
  2. dynamic algorithms
  3. static analysis

Qualifiers

  • Research-article

Funding Sources

  • VILLUM FONDEN
  • SERB MATRICS

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 414
    Total Downloads
  • Downloads (Last 12 months)361
  • Downloads (Last 6 weeks)33
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media