Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3597926.3598041acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Reducing the Memory Footprint of IFDS-Based Data-Flow Analyses using Fine-Grained Garbage Collection

Published: 13 July 2023 Publication History

Abstract

The IFDS algorithm can be both memory- and compute-intensive for large programs as it needs to store a huge amount of path edges in memory and process them until a fixed point. In general, an IFDS-based data-flow analysis, such as taint analysis, aims to discover only the data-flow facts at some program points. Maintaining a huge amount of path edges (with many visited only once) wastes memory resources, and consequently, reduces its scalability and efficiency (due to frequent re-hashings for the path-edge data structure used).
This paper introduces a fine-grained garbage collection (GC) algorithm to enable (multi-threaded) IFDS to reduce its memory footprint by removing non-live path edges (i.e., ones that are no longer needed for establishing other path edges) from its path-edge data structure. The resulting IFDS algorithm, named FPC, retains the correctness, precision, and termination properties of IFDS while avoiding re-processing GC’ed path edges redundantly (in the presence of unknown recursive cycles that may be formed in future iterations of the analysis). Unlike CleanDroid, which augments IFDS with a coarse-grained GC algorithm to collect path edges at the method level, FPC is fine-grained by collecting path edges at the data-fact level. As a result, FPC can collect more path edges than CleanDroid, and consequently, cause fewer re-hashings for the path-edge data structure used. In our evaluation, we focus on applying an IFDS-based taint analysis to a set of 28 Android apps. FPC can scalably analyze three apps that CleanDroid fails to run to completion (under a 3-hour budget per app) due to out-of-memory (OoM). For the remaining 25 apps, FPC reduces the number of path edges and memory usage incurred under CleanDroid by 4.4× and 1.4× on average, respectively, and consequently, outperforms CleanDroid by 1.7× on average (with 18.5× in the best case).

References

[1]
Steven Arzt. 2021. Sustainable Solving: Reducing The Memory Footprint of IFDS-Based Data Flow Analyses Using Intelligent Garbage Collection. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, New York, NY, USA. 1098–1110. https://doi.org/10.1109/ICSE43902.2021.00102
[2]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Notices, 49, 6 (2014), 259–269. https://doi.org/10.1145/2666356.2594299
[3]
Secure Software Engineering Group at Paderborn University and Fraunhofer IEM. 2022. DroidBench: an open test suite for evaluating the effectiveness of taint-analysis tools specifically for Android apps. https://github.com/secure-software-engineering/DroidBench
[4]
Vitalii Avdiienko, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller, Steven Arzt, Siegfried Rasthofer, and Eric Bodden. 2015. Mining apps for abnormal usage of sensitive data. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, IEEE, New York, NY, USA. 426–436. https://doi.org/10.1109/ICSE.2015.61
[5]
Jonathan Bell and Gail Kaiser. 2014. Phosphor: Illuminating Dynamic Data Flow in Commodity Jvms. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’14). Association for Computing Machinery, New York, NY, USA. 83–101. isbn:9781450325851 https://doi.org/10.1145/2660193.2660212
[6]
Manuel Benz, Erik Krogh Kristensen, Linghui Luo, Nataniel P. Borges, Eric Bodden, and Andreas Zeller. 2020. Heaps’n Leaks: How Heap Snapshots Improve Android Taint Analysis. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1061–1072. https://doi.org/10.1145/3377811.3380438
[7]
Eric Bodden. 2012. Inter-procedural data-flow analysis with IFDS/IDE and soot. In Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis. Association for Computing Machinery, New York, NY, USA. 3–8. https://doi.org/10.1145/2259051.2259052
[8]
Haipeng Cai and John Jenkins. 2018. Leveraging Historical Versions of Android Apps for Efficient and Precise Taint Analysis. In Proceedings of the 15th International Conference on Mining Software Repositories. Association for Computing Machinery, New York, NY, USA. 265–269. https://doi.org/10.1145/3196398.3196433
[9]
IBM T.J. Watson Research Center. 2022. WALA: T.J. Watson Libraries for Analysis. http://wala.sourceforge.net/
[10]
James Clause, Wanchun Li, and Alessandro Orso. 2007. Dytan: a generic dynamic taint analysis framework. In Proceedings of the 2007 international symposium on Software testing and analysis. Association for Computing Machinery, New York, NY, USA. 196–206. https://doi.org/10.1145/1273463.1273490
[11]
Michael I Gordon, Deokhwan Kim, Jeff H Perkins, Limei Gilham, Nguyen Nguyen, and Martin C Rinard. 2015. Information Flow Analysis of Android Applications in DroidSafe. In NDSS. 15, 110. https://doi.org/10.14722/ndss.2015.23089
[12]
Neville Grech and Yannis Smaragdakis. 2017. P/taint: Unified points-to and taint analysis. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 1–28. https://doi.org/10.1145/3133926
[13]
Dongjie He, Yujiang Gui, Yaoqing Gao, and Jingling Xue. 2023. Reducing the Memory Footprint of IFDS-Based Data-Flow Analyses using Fine-Grained Garbage Collection (Artifact). In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’23). https://doi.org/10.5281/zenodo.7965678
[14]
Dongjie He, Haofeng Li, Lei Wang, Haining Meng, Hengjie Zheng, Jie Liu, Shuangwei Hu, Lian Li, and Jingling Xue. 2019. Performance-boosting sparsification of the ifds algorithm with applications to taint analysis. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, New York, NY, USA. 267–279. https://doi.org/10.1109/ASE.2019.00034
[15]
Dongjie He, Lian Li, Lei Wang, Hengjie Zheng, Guangwei Li, and Jingling Xue. 2018. Understanding and detecting evolution-induced compatibility issues in Android apps. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA. 167–177. https://doi.org/10.1145/3238147.3238185
[16]
Dongjie He, Jingbo Lu, and Jingling Xue. 2021. Context debloating for object-sensitive pointer analysis. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, New York, NY, USA. 79–91. https://doi.org/10.1109/ASE51524.2021.9678880
[17]
Dongjie He, Jingbo Lu, and Jingling Xue. 2022. Qilin: A New Framework For Supporting Fine-Grained Context-Sensitivity in Java Pointer Analysis. In 36th European Conference on Object-Oriented Programming (ECOOP 2022), Karim Ali and Jan Vitek (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 222). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. 30:1–30:29. https://doi.org/10.4230/LIPIcs.ECOOP.2022.30
[18]
Dongjie He, Jingbo Lu, and Jingling Xue. 2023. IFDS-Based Context Debloating for Object-Sensitive Pointer Analysis. ACM Transactions on Software Engineering and Methodology, 32, 4 (2023), Article 101, jan, 44 pages. https://doi.org/10.1145/3579641
[19]
Katherine Hough and Jonathan Bell. 2021. A Practical Approach for Dynamic Taint Tracking with Control-Flow Relationships. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 2 (2021), Article 26, dec, 43 pages. https://doi.org/10.1145/3485464
[20]
Kaihang Ji, Jun Zeng, Yuancheng Jiang, Zhenkai Liang, Zheng Leong Chua, Prateek Saxena, and Abhik Roychoudhury. 2022. FlowMatrix: GPU-Assisted Information-Flow Analysis through Matrix-Based Representation. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA. 2567–2584.
[21]
Rezwana Karim, Frank Tip, Alena Sochŭrková, and Koushik Sen. 2018. Platform-independent dynamic taint analysis for javascript. IEEE Transactions on Software Engineering, 46, 12 (2018), 1364–1379. https://doi.org/10.1109/TSE.2018.2878020
[22]
William Klieber, Lori Flynn, Amar Bhosale, Limin Jia, and Lujo Bauer. 2014. Android taint flow analysis for app sets. In Proceedings of the 3rd ACM SIGPLAN International Workshop on the State of the Art in Java Program Analysis. Association for Computing Machinery, New York, NY, USA. 1–6. https://doi.org/10.1145/2614628.2614633
[23]
C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, New York, NY, USA. 75–86. https://doi.org/10.1109/CGO.2004.1281665
[24]
Johannes Lerch, Ben Hermann, Eric Bodden, and Mira Mezini. 2014. FlowTwist: Efficient Context-Sensitive inside-out Taint Analysis for Large Codebases. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA. 98–108. https://doi.org/10.1145/2635868.2635878
[25]
Haofeng Li, Haining Meng, Hengjie Zheng, Liqing Cao, Jie Lu, Lian Li, and Lin Gao. 2021. Scaling up the IFDS algorithm with efficient disk-assisted computing. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, New York, NY, USA. 236–247. https://doi.org/10.1109/CGO51591.2021.9370311
[26]
Li Li, Alexandre Bartel, Tegawendé F Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel. 2015. IccTA: Detecting inter-component privacy leaks in Android apps. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE, New York, NY, USA. 280–291. https://doi.org/10.1109/ICSE.2015.48
[27]
Linghui Luo, Eric Bodden, and Johannes Späth. 2019. A qualitative analysis of android taint-analysis results. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, New York, NY, USA. 102–114. https://doi.org/10.1109/ASE.2019.00020
[28]
Linghui Luo, Felix Pauck, Goran Piskachev, Manuel Benz, Ivan Pashchenko, Martin Mory, Eric Bodden, Ben Hermann, and Fabio Massacci. 2022. TaintBench: Automatic real-world malware benchmarking of Android taint analyses. Empirical Software Engineering, 27, 1 (2022), 1–41. https://doi.org/10.1007/s10664-021-10013-5
[29]
Björn Mathis, Vitalii Avdiienko, Ezekiel O Soremekun, Marcel Böhme, and Andreas Zeller. 2017. Detecting information flow by mutating input data. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, New York, NY, USA. 263–273. https://doi.org/10.1109/ASE.2017.8115639
[30]
Ibrahim Mesecan, Daniel Blackwell, David Clark, Myra B Cohen, and Justyna Petke. 2021. HyperGI: Automated Detection and Repair of Information Flow Leakage. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, New York, NY, USA. 1358–1362. https://doi.org/10.1109/ASE51524.2021.9678758
[31]
Ana Milanova, Atanas Rountev, and Barbara G Ryder. 2002. Parameterized object sensitivity for points-to and side-effect analyses for Java. In Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis. Association for Computing Machinery, New York, NY, USA. 1–11. https://doi.org/10.1145/566172.566174
[32]
Ana Milanova, Atanas Rountev, and Barbara G Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Transactions on Software Engineering and Methodology, 14, 1 (2005), 1–41. https://doi.org/10.1145/1044834.1044835
[33]
Austin Mordahl and Shiyi Wei. 2021. The Impact of Tool Configuration Spaces on the Evaluation of Configurable Taint Analysis for Android. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). Association for Computing Machinery, New York, NY, USA. 466–477. https://doi.org/10.1145/3460319.3464823
[34]
Nomair A. Naeem and Ondrej Lhotak. 2008. Typestate-like Analysis of Multiple Interacting Objects. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications. Association for Computing Machinery, New York, NY, USA. 347–366. https://doi.org/10.1145/1449764.1449792
[35]
Nomair A Naeem, Ondřej Lhoták, and Jonathan Rodriguez. 2010. Practical extensions to the IFDS algorithm. In International Conference on Compiler Construction. Springer Berlin Heidelberg, Berlin, Heidelberg. 124–144. https://doi.org/10.1007/978-3-642-11970-5_8
[36]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. Association for Computing Machinery, New York, NY, USA. 49–61. https://doi.org/10.1145/199448.199462
[37]
Mooly Sagiv, Thomas Reps, and Susan Horwitz. 1996. Precise interprocedural dataflow analysis with applications to constant propagation. Theoretical Computer Science, 167, 1-2 (1996), 131–170. https://doi.org/10.1016/0304-3975(96)00072-2
[38]
Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. 2019. Phasar: An inter-procedural static analysis framework for C/C++. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer International Publishing, Cham. 393–410. https://doi.org/10.1007/978-3-030-17465-1_22
[39]
Dongdong She, Yizheng Chen, Abhishek Shah, Baishakhi Ray, and Suman Jana. 2020. Neutaint: Efficient dynamic taint analysis with neural networks. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, New York, NY, USA. 1527–1543. https://doi.org/10.1109/SP40000.2020.00022
[40]
Johannes Späth, Lisa Nguyen Quang Do, Karim Ali, and Eric Bodden. 2016. Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java. In 30th European Conference on Object-Oriented Programming (ECOOP 2016) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 56). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 22:1–22:26. https://doi.org/10.4230/LIPIcs.ECOOP.2016.22
[41]
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 2010. Soot: A Java Bytecode Optimization Framework. In CASCON First Decade High Impact Papers. IBM Corp., USA. 214–224. https://doi.org/10.1145/1925805.1925818
[42]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A Single-Machine Disk-Based Graph System for Interprocedural Static Analyses of Large-Scale Systems Code. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). Association for Computing Machinery, New York, NY, USA. 389–404. https://doi.org/10.1145/3037697.3037744
[43]
Fengguo Wei, Sankardas Roy, and Xinming Ou. 2014. Amandroid: A precise and general inter-component data flow analysis framework for security vetting of Android apps. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, New York, NY, USA. 1329–1341. https://doi.org/10.1145/3183575
[44]
Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On abstraction refinement for program analyses in Datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery, New York, NY, USA. 239–248. https://doi.org/10.1145/2594291.2594327

Cited By

View all
  • (2024)Boosting the Performance of Alias-Aware IFDS Analysis with CFL-Based Environment TransformersProceedings of the ACM on Programming Languages10.1145/36898048:OOPSLA2(2633-2661)Online publication date: 8-Oct-2024
  • (2023)Merge-Replay: Efficient IFDS-Based Taint Analysis by Consolidating Equivalent Value Flows2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00027(319-331)Online publication date: 11-Sep-2023

Index Terms

  1. Reducing the Memory Footprint of IFDS-Based Data-Flow Analyses using Fine-Grained Garbage Collection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
    July 2023
    1554 pages
    ISBN:9798400702211
    DOI:10.1145/3597926
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. IFDS
    2. Path Edge Collection
    3. Taint Analysis

    Qualifiers

    • Research-article

    Conference

    ISSTA '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)93
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Boosting the Performance of Alias-Aware IFDS Analysis with CFL-Based Environment TransformersProceedings of the ACM on Programming Languages10.1145/36898048:OOPSLA2(2633-2661)Online publication date: 8-Oct-2024
    • (2023)Merge-Replay: Efficient IFDS-Based Taint Analysis by Consolidating Equivalent Value Flows2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00027(319-331)Online publication date: 11-Sep-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media