The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code

B Hardekopf, C Lin - Proceedings of the 28th ACM SIGPLAN …, 2007 - dl.acm.org
B Hardekopf, C Lin
Proceedings of the 28th ACM SIGPLAN Conference on Programming Language …, 2007dl.acm.org
Pointer information is a prerequisite for most program analyses, and the quality of this
information can greatly affect their precision and performance. Inclusion-based (ie Andersen-
style) pointer analysis is an important point in the space of pointer analyses, offering a
potential sweet-spot in the trade-off between precision and performance. However, current
techniques for inclusion-based pointer analysis can have difficulties delivering on this
potential. We introduce and evaluate two novel techniques for inclusion-based pointer …
Pointer information is a prerequisite for most program analyses, and the quality of this information can greatly affect their precision and performance. Inclusion-based (i.e. Andersen-style) pointer analysis is an important point in the space of pointer analyses, offering a potential sweet-spot in the trade-off between precision and performance. However, current techniques for inclusion-based pointer analysis can have difficulties delivering on this potential.
We introduce and evaluate two novel techniques for inclusion-based pointer analysis---one lazy, one eager1---that significantly improve upon the current state-of-the-art without impacting precision. These techniques focus on the problem of online cycle detection, a critical optimization for scaling such analyses. Using a suite of six open-source C programs, which range in size from 169K to 2.17M LOC, we compare our techniques against the three best inclusion-based analyses--described by Heintze and Tardieu [11], by Pearce et al. [21], and by Berndl et al. [4]. The combination of our two techniques results in an algorithm which is on average 3.2 xfaster than Heintze and Tardieu's algorithm, 6.4 xfaster than Pearce et al.'s algorithm, and 20.6 faster than Berndl et al.'s algorithm.
We also investigate the use of different data structures to represent points-to sets, examining the impact on both performance and memory consumption. We compare a sparse-bitmap implementation used in the GCC compiler with a BDD-based implementation, and we find that the BDD implementation is on average 2x slower than using sparse bitmaps but uses 5.5x less memory.
ACM Digital Library