Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Octopus: Scaling Value-Flow Analysis via Parallel Collection of Realizable Path Conditions

Published: 29 March 2024 Publication History

Abstract

Value-flow analysis is a fundamental technique in program analysis, benefiting various clients, such as memory corruption detection and taint analysis. However, existing efforts suffer from the low potential speedup that leads to a deficiency in scalability. In this work, we present a parallel algorithm Octopus to collect path conditions for realizable paths efficiently. Octopus builds on the realizability decomposition to collect the intraprocedural path conditions of different functions simultaneously on-demand and obtain realizable path conditions by concatenation, which achieves a high potential speedup in parallelization. We implement Octopus as a tool and evaluate it over 15 real-world programs. The experiment shows that Octopus significantly outperforms the state-of-the-art algorithms. Particularly, it detects NULL-pointer-dereference bugs for the project llvm with 6.3 MLoC within 6.9 minutes under the 40-thread setting. We also state and prove several theorems to demonstrate the soundness, completeness, and high potential speedup of Octopus. Our empirical and theoretical results demonstrate the great potential of Octopus in supporting various program analysis clients. The implementation has officially deployed at Ant Group, scaling the nightly code scan for massive FinTech applications.

Supplementary Material

3632743-supp (3632743-supp.pdf)
Supplementary material

References

[1]
Alex Aiken, Suhabe Bugrara, Isil Dillig, Thomas Dillig, Brian Hackett, and Peter Hawkins. 2007. An overview of the saturn project. In ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. 43–48.
[2]
Aws Albarghouthi, Rahul Kumar, Aditya V. Nori, and Sriram K. Rajamani. 2012. Parallelizing top-down interprocedural analyses. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, 217–228.
[3]
Steven Arzt and Eric Bodden. 2016. StubDroid: Automatic inference of precise data-flow summaries for the Android framework. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16). Association for Computing Machinery, New York, NY, 725–735.
[4]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). Association for Computing Machinery, New York, NY, 259–269.
[5]
Domagoj Babić and Alan J. Hu. 2008. Calysto: Scalable and precise extended static checking. In Proceedings of the International Conference on Software Engineering, 211–220.
[6]
David A. Bader and Kamesh Madduri. 2006. Designing multithreaded algorithms for Breadth-First Search and si-connectivity on the Cray MTA-2. In Proceedings of the International Conference on Parallel Processing, 523–530.
[7]
Thorsten Blaß and Michael Philippsen. 2019. GPU-Accelerated fixpoint algorithms for faster compiler analyses. Perv. Comput. Technol. Healthcare (2019), 122–134.
[8]
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (Sep. 1999), 720–748.
[9]
James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the Proceedings of the ACM/SPEC International Conference on Performance Engineering (ICPE’18). Association for Computing Machinery, New York, NY, 41–42.
[10]
Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2017. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang. 2, (2017), 30:1–30:30.
[11]
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, 480–491.
[12]
Victor Chibotaru, Benjamin Bichsel, Veselin Raychev, and Martin Vechev. 2019. Scalable taint specification inference with big code. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’19). Association for Computing Machinery, New York, NY, 760–774.
[13]
Dmitry Chistikov, Rupak Majumdar, and Philipp Schepper. 2022. Subcubic certificates for CFL reachability. Proc. ACM Program. Lang. 6 (2022), 1–29.
[14]
Douglas Comer. 2011. Operating System Design: The Xinu Approach, Linksys Version (1st ed.). Chapman & Hall/CRC.
[15]
CWE-23. 2022. Common Weakness Enumeration. Retrieved from https://cwe.mitre.org/data/definitions/23.html
[16]
Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Zadeck. 1989. An efficient method of computing static single assignment form. In Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 25–35.
[17]
Manuvir Das, Sorin Lerner, and Mark Seigle. 2002. ESP: Path-sensitive program verification in polynomial time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). 57–68.
[18]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08). Springer, 337–340.
[19]
Lisa Nguyen Quang Do and Eric Bodden. 2022. Explaining static analysis with rule graphs. IEEE Trans. Softw. Eng. 48, 2 (2022), 678–690.
[20]
Lisa Nguyen Quang Do, Stefan Krüger, Patrick Hill, Karim Ali, and Eric Bodden. 2020. Debugging static analysis. IEEE Trans. Softw. Eng. 46, 7 (2020), 697–709.
[21]
Manuel Fähndrich and K. Rustan M. Leino. 2003. Declaring and checking non-null types in an object-oriented language. In Proceedings of the 18th Annual ACM SIGPLAN Conference on Object-oriented Programing, Systems, Languages, and Applications. 302–312.
[22]
Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, and Charles Zhang. 2019. Smoke: Scalable path-sensitive memory leak detection for millions of lines of code. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19). IEEE, 72–82.
[23]
Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified points-to and taint analysis. Proc. ACM Program. Lang. 1 (2017), 102:1–102:28.
[24]
Seongjoon Hong, Junhee Lee, Jeongsoo Lee, and Hakjoo Oh. 2020. SAVER: Scalable, precise, and safe memory-error repair. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20), Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 271–283.
[25]
Susan Horwitz, Alan J. Demers, and Tim Teitelbaum. 1987. An efficient general iterative algorithm for dataflow analysis. Acta Inf. 24, 6 (1987), 679–694.
[26]
Susan Horwitz, Thomas Reps, and Mooly Sagiv. 1995. Demand interprocedural dataflow analysis. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering. 104–115.
[27]
John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. In Proceedings of the 25th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). ACM, 207–218.
[28]
Daniel Kroening and Michael Tautschnig. 2014. CBMC—C bounded model checker. In Tools and Algorithms for the Construction and Analysis of Systems, Erika Ábrahám and Klaus Havelund (Eds.). Springer, Berlin, 389–391.
[29]
Zhengmin Lai, You Peng, Shiyu Yang, Xuemin Lin, and Wenjie Zhang. 2021. Pefp: Efficient k-hop constrained st simple path enumeration on fpga. In Proceedings of the IEEE 37th International Conference on Data Engineering (ICDE’21). IEEE, 1320–1331.
[30]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the 2nd International Symposium on Code Generation and Optimization (CGO’04). IEEE, 75:1–75:12.
[31]
Charles E. Leiserson and Tao B. Schardl. 2010. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). Association for Computing Machinery, New York, NY, 303–314.
[32]
Lian Li, Cristina Cifuentes, and Nathan Keynes. 2011. Boosting the performance of flow-sensitive points-to analysis using value flow. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, 343–353.
[33]
Tuo Li, Jia-Ju Bai, Yulei Sui, and Shi-Min Hu. 2022. Path-sensitive and alias-aware typestate analysis for detecting OS bugs. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’22). Association for Computing Machinery, New York, NY, 859–872.
[34]
Zijun Li, Linsong Guo, Jiagan Cheng, Quan Chen, Bingsheng He, and Minyi Guo. 2021. The serverless computing survey: A technical primer for design architecture. arXiv:2112.12921. Retrieved from https://arxiv.org/abs/2112.12921.
[35]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møler, and Dimitrios Vardoulakis. 2015. In defense of soundiness: A manifesto. Commun. ACM 58, 2 (2015), 44–46.
[36]
Linghui Luo, Eric Bodden, and Johannes Späth. 2019. A qualitative analysis of Android taint-analysis results. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). 102–114.
[37]
Anders Alnor Mathiasen and Andreas Pavlogiannis. 2021. The fine-grained and parallel complexity of andersen’s pointer analysis. Proc. ACM Program. Lang. 5 (2021), 34. arxiv:2006.01491
[38]
Stephen McCamant and Michael D. Ernst. 2008. Quantitative information flow as network flow capacity. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). Association for Computing Machinery, New York, NY, 193–205.
[39]
Scott McPeak, Charles Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE’13). 554–564.
[40]
Mario Mendez-Lojo, Martin Burtscher, and Keshav Pingali. 2012. A GPU implementation of inclusion-based points-to analysis. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, 107–116.
[41]
Mario Méndez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel inclusion-based points-to analysis. In Proceedings of the 25th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’10). ACM, 428–443.
[42]
Ana Milanova. 2020. FlowCFL: Generalized type-based reachability analysis: Graph reduction and equivalence of CFL-based and type-based reachability. Proc. ACM Program. Lang. 4 (2020), 1–29.
[43]
Brian R. Murphy and Monica S. Lam. 2000. Program analysis with partial transfer functions. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, 94–103.
[44]
You Peng, Xuemin Lin, Ying Zhang, Wenjie Zhang, Lu Qin, and Jingren Zhou. 2021. Efficient Hop-Constrained s-t simple path enumeration. VLDB J. 30, 5 (Sep.2021), 799–823.
[45]
Gordon D. Plotkin. 1975. Call-by-name, call-by-value and the \(\lambda\)-calculus. Theor. Comput. Sci. 1, 2 (1975), 125–159.
[46]
Thomas Reps. 1995. Shape analysis as a generalized path problem. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM’95). ACM, 1–11.
[47]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’95). ACM, 49–61.
[48]
Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding up slicing. ACM SIGSOFT Softw. Eng. Not. 19, 5 (1994), 11–20.
[49]
Jonathan Rodriguez and Ondřej Lhoták. 2011. Actor-based parallel dataflow analysis. In Proceedings of the 20th International Conference on Compiler Construction (CC’11). Springer, 179–197.
[50]
Mooly Sagiv, Thomas Reps, and Susan Horwitz. 1996. Precise interprocedural dataflow analysis with applications to constant propagation. Theoret. Comput. Sci. 167, 1 (1996), 131–170.
[51]
Qingkai Shi, Yongchao Wang, Peisen Yao, and Charles Zhang. 2022. Indexing the extended Dyck-CFL reachability for context-sensitive program analysis. Proc. ACM Program. Lang. (2022), 1–31.
[52]
Qingkai Shi, Rongxin Wu, Gang Fan, and Charles Zhang. 2020. Conquering the extensional scalability problem for value-flow analysis frameworks. In Proceedings of the International Conference on Software Engineering, 812–823. arxiv:1912.06878
[53]
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. ACM SIGPLAN Not. 53, 4 (2018), 693–706.
[54]
Qingkai Shi, Peisen Yao, Rongxin Wu, and Charles Zhang. 2021. Path-sensitive sparse analysis without path conditions. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’21). Association for Computing Machinery, New York, NY, USA, 930–943.
[55]
Qingkai Shi and Charles Zhang. 2020. Pipelining bottom-up data flow analysis. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20), Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 835–847.
[56]
Johannes Späth, Lisa Nguyen Quang Do, Karim Ali, and Eric Bodden. 2016. Boomerang: Demand-driven flow-and context-sensitive pointer analysis for java. In Proceedings of the 30th European Conference on Object-Oriented Programming (ECOOP’16). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 1–26.
[57]
SRI-CSL. 2023. Whole Program LLVM: Wllvm Ported to Go. Retrieved January 19, 2023 from https://github.com/SRI-CSL/gllvm.
[58]
Manu Sridharan and Rastislav Bodík. 2006. Refinement-based context-sensitive points-to analysis for Java. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, 387–400.
[59]
Manu Sridharan, Stephen J. Fink, and Rastislav Bodik. 2007. Thin slicing. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, 112–122.
[60]
Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven points-to analysis for Java. In Proceedings of the 20th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’05). ACM, 59–76.
[61]
Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with CFL-reachability. In Proceedings of the International Conference on Parallel Processing, Vol. 2014-Novem. 451–460.
[62]
Yulei Sui and Jingling Xue. 2016. On-demand strong update analysis via value-flow refinement. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 460–473.
[63]
Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC’16). ACM, 265–266.
[64]
Yulei Sui and Jingling Xue. 2018. Value-flow-based demand-driven pointer analysis for C and C++. IEEE Trans. Softw. Eng. 46, 8 (2018), 812–835.
[65]
Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’12), 254–264.
[66]
Vijay Sundaresan, Laurie Hendren, Chrislain Razafimahefa, Raja Vallée-Rai, Patrick Lam, Etienne Gagnon, and Charles Godin. 2000. Practical virtual method call resolution for Java. In Proceedings of the 11th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’96), 264–280.
[67]
Hao Tang, Xiaoyin Wang, Lingming Zhang, Bing Xie, Lu Zhang, and Hong Mei. 2015. Summary-based context-sensitive data-dependence analysis in presence of callbacks. In Proceedings of the 42nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15). ACM, 83–95.
[68]
Chengpeng Wang, Wenyang Wang, Peisen Yao, Qingkai Shi, Jinguo Zhou, Xiao Xiao, and Charles Zhang. 2022. Anchor: Fast and precise value-flow analysis for containers via memory orientation. ACM Trans. Softw. Eng. Methodol. (2022).
[69]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, 389–404.
[70]
Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In Proceedings of the 23rd European Conference on Object-Oriented Programming (ECOOP’09). Springer, 98–122.
[71]
Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In Proceedings of the 20th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’11). ACM, 155–165.
[72]
Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue. 2018. Spatio-temporal context reduction: A pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In Proceedings of the International Conference on Software Engineering, 327–337.
[73]
Zhang Yang and Eric A. Hansen. 2006. Parallel Breadth-first Heuristic Search on a Shared-memory Architecture. AAAI Workshop, Technical Report WS-06-08 (2006), 33–38.
[74]
Fiorella Zampetti, Salvatore Geremia, Gabriele Bavota, and Massimiliano Di Penta. 2021. CI/CD pipelines evolution and restructuring: A qualitative and quantitative study. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’21). IEEE, 471–482.
[75]
Danfeng Zhang and Andrew C. Myers. 2014. Toward general diagnosis of static errors. ACM SIGPLAN Not. 49, 1 (2014), 569–581.
[76]
Qirun Zhang, Michael R. Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13), Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 435–446.
[77]
Qirun Zhang, Xiao Xiao, Charles Zhang, Hao Yuan, and Zhendong Su. 2014. Efficient subcubic alias analysis for C. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’14). ACM, 829–845.
[78]
Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’08). Association for Computing Machinery, New York, NY, 197–208.
[79]
Zhiqiang Zuo, Yiyu Zhang, Qiuhong Pan, Shenming Lu, Yue Li, Linzhang Wang, Xuandong Li, and Guoqing Harry Xu. 2021. Chianina: An evolving graph system for flow-and context-sensitive analyses of million lines of C code. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’21). ACM, 914–929.

Cited By

View all
  • (2024)REACT: IR-Level Patch Presence Test for BinaryProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695012(381-392)Online publication date: 27-Oct-2024
  • (2024)Exploring Scalability of Value-Flow Graph ConstructionProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690951(112-117)Online publication date: 19-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 3
March 2024
943 pages
EISSN:1557-7392
DOI:10.1145/3613618
  • Editor:
  • Mauro Pezzé
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2024
Online AM: 24 January 2024
Accepted: 19 October 2023
Revised: 10 October 2023
Received: 22 February 2023
Published in TOSEM Volume 33, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Value-flow analysis
  2. parallel computation

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong Research Grant Council and the Innovation and Technology Commission, Ant Group, and the donations from Microsoft and Huawei

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)365
  • Downloads (Last 6 weeks)34
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)REACT: IR-Level Patch Presence Test for BinaryProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695012(381-392)Online publication date: 27-Oct-2024
  • (2024)Exploring Scalability of Value-Flow Graph ConstructionProceedings of the 2024 4th International Conference on Artificial Intelligence, Automation and High Performance Computing10.1145/3690931.3690951(112-117)Online publication date: 19-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media