Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3213846.3213860acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Shooting from the heap: ultra-scalable static analysis with heap snapshots

Published: 12 July 2018 Publication History

Abstract

Traditional whole-program static analysis (e.g., a points-to analysis that models the heap) encounters scalability problems for realistic applications. We propose a ``featherweight'' analysis that combines a dynamic snapshot of the heap with otherwise full static analysis of program behavior.
The analysis is extremely scalable, offering speedups of well over 3x, with complexity empirically evaluated to grow linearly relative to the number of reachable methods. The analysis is also an excellent tradeoff of precision and recall (relative to different dynamic executions): while it can never fully capture all program behaviors (i.e., it cannot match the near-perfect recall of a full static analysis) it often approaches it closely while achieving much higher (3.5x) precision.

References

[1]
Edward E. Aftandilian, Sean Kelley, Connor Gramazio, Nathan Ricci, Sara L. Su, and Samuel Z. Guyer. 2010. Heapviz: Interactive Heap Visualization for Program Understanding and Debugging. In Proceedings of the 5th International Symposium on Software Visualization (SOFTVIS ’10). ACM, New York, NY, USA, 53–62.
[2]
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66–75.
[3]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications. ACM Press, New York, NY, USA, 169–190.
[4]
Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In ICSE. ACM, New York, NY, USA, 241–250. 1145/1985793.1985827
[5]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). ACM, New York, NY, USA, 243–262.
[6]
Qichang Chen, Liqiang Wang, Zijiang Yang, and Scott D. Stoller. 2009. HAVE: Detecting Atomicity Violations via Integrated Dynamic and Static Analysis. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (FASE ’09). Springer-Verlag, Berlin, Heidelberg, 425–439.
[7]
Christoph Csallner and Yannis Smaragdakis. 2005. Check ’n’ Crash: Combining static checking and testing. In Proc. 27th ACM/IEEE International Conference on Software Engineering (ICSE). ACM, 422–431.
[8]
Christoph Csallner, Yannis Smaragdakis, and Tao Xie. 2008. DSD-Crasher: A Hybrid Analysis Tool for Bug Finding. ACM Transactions on Software Engineering and Methodology 17, 2, Article 8 (May 2008), 37 pages. 1348250.1348254
[9]
Bassem Elkarablieh, Sarfraz Khurshid, Duy Vu, and Kathryn S. McKinley. 2007. STARC: Static Analysis for Efficient Repair of Complex Data. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications (OOPSLA ’07). ACM, New York, NY, USA, 387–404.
[10]
Michael D. Ernst. 2003. Static and dynamic analysis: Synergy and duality. In WODA 2003: ICSE Workshop on Dynamic Analysis. Portland, OR, 24–27.
[11]
Cormac Flanagan and Stephen N. Freund. 2006. Dynamic Architecture Extraction. In Proceedings of the First Combined International Conference on Formal Approaches to Software Testing and Runtime Verification (FATES’06/RV’06). Springer-Verlag, Berlin, Heidelberg, 209–224.
[12]
Adrian Francalanza, Luca Aceto, Antonis Achilleos, Duncan Paul Attard, Ian Cassar, Dario Della Monica, and Anna Ingólfsdóttir. 2017. A Foundation for Runtime Monitoring. In Runtime Verification (RV) (LNCS), Vol. 10548. Springer, 8–29.
[13]
Adrian Francalanza, Luca Aceto, and Anna Ingolfsdottir. 2017. Monitorability for the Hennessy–Milner logic with recursion. Formal Methods in System Design (2017), 1–30.
[14]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). ACM, New York, NY, USA, 213–223.
[15]
Neville Grech, George Fourtounis, Adrian Francalanza, and Yannis Smaragdakis. 2017. Heaps Don’t Lie: Countering Unsoundness with Heap Snapshots. Proc. ACM Programming Languages (PACMPL) OOPSLA (Oct. 2017), 68:1–68:27. Issue 1.
[16]
Neville Grech, George Kastrinis, and Yannis Smaragdakis. 2018. Efficient Reflection String Analysis via Graph Coloring. In Proceedings of the 32nd European Conference on Object-Oriented Programming (ECOOP’18), Vol. 109. LIPICS, Leibniz, Germany, Article 26, 25 pages.
[17]
Neville Grech, Julian Rathke, and Bernd Fischer. 2013. Preemptive type checking in dynamically typed languages. In International Colloquium on Theoretical Aspects of Computing. Springer, Springer-Verlag, Berlin, Heidelberg, 195–212.
[18]
Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Points-to and Taint Analysis. Proc. ACM Programming Languages (PACMPL) 1, OOPSLA, Article 102 (Oct. 2017), 28 pages.
[19]
Rajiv Gupta, Mary Lou Soffa, and John Howard. 1997. Hybrid Slicing: Integrating Dynamic Information with Static Analysis. ACM Transactions on Software Engineering and Methodology 6, 4 (Oct. 1997), 370–397. 261640.261644
[20]
Robert Harper. 2016. Practical Foundations for Programming Languages (2nd ed.). Cambridge University Press, New York, NY, USA.
[21]
Michael Hind. 2001. Pointer analysis: haven’t we solved this problem yet?. In Proc. of the 3rd ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’01). ACM, New York, NY, USA, 54–61.
[22]
George Kastrinis and Yannis Smaragdakis. 2013. Hybrid Context-Sensitivity for Points-To Analysis. In Proc. of the 2013 ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA.
[23]
B Korel, H Wedde, and R Ferguson. 1992. Dynamic method of test data generation for distributed software. Information and Software Technology 34, 8 (1992), 523 – 531.
[24]
Kaituo Li, Christoph Reichenbach, Christoph Csallner, and Yannis Smaragdakis. 2012. Residual Investigation: Predictive and Precise Bug Detection. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA 2012). ACM, New York, NY, USA, 298–308.
[25]
Aditya P. Mathur. 2008. Foundations of Software Testing (1st ed.). Addison-Wesley Professional.
[26]
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Transactions on Software Engineering and Methodology 14, 1 (2005), 1–41.
[27]
[28]
Flemming Nielson, Hanne R. Nielson, and Chris Hankin. 2010. Principles of Program Analysis. Springer Publishing Company, Incorporated.
[29]
Oracle. {n. d.}. HPROF Binary Format. https://java.net/downloads/ heap-snapshot/hprof-binary-format.html
[30]
Alex Potanin, James Noble, and Robert Biddle. 2004. Checking Ownership and Confinement: Research Articles. Concurrency and Computation: Practice & Experience - Formal Techniques for Java-like Programs 16, 7 (June 2004), 671–687.
[31]
Easwaran Raman and David I. August. 2005. Recursive Data Structure Profiling. In Proceedings of the 2005 Workshop on Memory System Performance (MSP ’05). ACM, New York, NY, USA, 5–14.
[32]
Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. 2002. Hybrid Analysis: Static & Dynamic Memory Reference Analysis. In Proceedings of the 16th International Conference on Supercomputing (ICS ’02). ACM, New York, NY, USA, 274–284.
[33]
Barbara G. Ryder. 2003. Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages. In Proc. of the 12th International Conf. on Compiler Construction (CC ’03). Springer, 126–137. 3-540-36579-6_10
[34]
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-13). ACM, New York, NY, USA, 263–272.
[35]
Micha Sharir and Amir Pnueli. 1981. Two Approaches to Interprocedural Data Flow Analysis. In Program flow analysis: theory and applications, Steven S. Muchnick and Neil D. Jones (Eds.). Prentice-Hall, Inc., Englewood Cliffs, NJ, Chapter 7, 189–233.
[36]
Yannis Smaragdakis and George Balatsouras. 2015. Pointer Analysis. Foundations and Trends in Programming Languages 2, 1 (2015), 1–69. 2500000014
[37]
Yannis Smaragdakis, Martin Bravenboer, and Ondřej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-Sensitivity. In Proc. of the 38th ACM SIGPLANSIGACT Symp. on Principles of Programming Languages (POPL ’11). ACM, New York, NY, USA, 17–30.
[38]
Yannis Smaragdakis, George Kastrinis, and George Balatsouras. 2014. Introspective Analysis: Context-sensitivity, Across the Board. In Proc. of the 2014 ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 485–495.
[39]
Manu Sridharan, Satish Chandra, Julian Dolby, Stephen J. Fink, and Eran Yahav. 2013. Alias Analysis for Object-Oriented Programs. In Aliasing in Object-Oriented Programming. Types, Analysis and Verification, Dave Clarke, James Noble, and Tobias Wrigstad (Eds.). Lecture Notes in Computer Science, Vol. 7850. Springer Berlin Heidelberg, 196–232.
[40]
Mana Taghdiri and Daniel Jackson. 2007. Inferring specifications to detect errors in code. Automated Software Engineering 14, 1 (2007), 87–121.

Cited By

View all
  • (2024)The ART of Sharing Points-to Analysis: Reusing Points-to Analysis Results Safely and EfficientlyProceedings of the ACM on Programming Languages10.1145/36898038:OOPSLA2(2606-2632)Online publication date: 8-Oct-2024
  • (2024)Scaling Type-Based Points-to Analysis with SaturationProceedings of the ACM on Programming Languages10.1145/36564178:PLDI(990-1013)Online publication date: 20-Jun-2024
  • (2024)Efficient Construction of Practical Python Call Graphs with Entity Knowledge BaseInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450010434:07(999-1024)Online publication date: 22-May-2024
  • Show More Cited By

Index Terms

  1. Shooting from the heap: ultra-scalable static analysis with heap snapshots

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis
      July 2018
      379 pages
      ISBN:9781450356992
      DOI:10.1145/3213846
      • General Chair:
      • Frank Tip,
      • Program Chair:
      • Eric Bodden
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. Heap Snapshots
      2. Program Analysis
      3. Scalability

      Qualifiers

      • Research-article

      Funding Sources

      • European Research Council
      • Oracle Labs collaborative research grant
      • Reach High Malta
      • Facebook Research and Academic Relations award

      Conference

      ISSTA '18
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 58 of 213 submissions, 27%

      Upcoming Conference

      ISSTA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)23
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)The ART of Sharing Points-to Analysis: Reusing Points-to Analysis Results Safely and EfficientlyProceedings of the ACM on Programming Languages10.1145/36898038:OOPSLA2(2606-2632)Online publication date: 8-Oct-2024
      • (2024)Scaling Type-Based Points-to Analysis with SaturationProceedings of the ACM on Programming Languages10.1145/36564178:PLDI(990-1013)Online publication date: 20-Jun-2024
      • (2024)Efficient Construction of Practical Python Call Graphs with Entity Knowledge BaseInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450010434:07(999-1024)Online publication date: 22-May-2024
      • (2023)Comparing Rapid Type Analysis with Points-To Analysis in GraalVM Native ImageProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622980(129-142)Online publication date: 19-Oct-2023
      • (2022)Striking a balanceProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510166(2043-2055)Online publication date: 21-May-2022
      • (2022)Fast and precise application code analysis using a partial libraryProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510046(934-945)Online publication date: 21-May-2022
      • (2022)A theory of monitorsInformation and Computation10.1016/j.ic.2021.104704281:COnline publication date: 3-Jan-2022
      • (2022)Fluently specifying taint-flow queries with TQLEmpirical Software Engineering10.1007/s10664-022-10165-y27:5Online publication date: 1-Sep-2022
      • (2021)Detecting Memory-Related Bugs by Tracking Heap Memory Management of C++ Smart Pointers2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE51524.2021.9678836(880-891)Online publication date: Nov-2021
      • (2020)On the recall of static call graph construction in practiceProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380441(1049-1060)Online publication date: 27-Jun-2020
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media