A practical automatic polyhedral parallelizer and locality optimizer
U Bondhugula, A Hartono, J Ramanujam… - Proceedings of the 29th …, 2008 - dl.acm.org
Proceedings of the 29th ACM SIGPLAN Conference on Programming Language …, 2008•dl.acm.org
We present the design and implementation of an automatic polyhedral source-to-source
transformation framework that can optimize regular programs (sequences of possibly
imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we
show the practicality of analytical model-driven automatic transformation in the polyhedral
model--far beyond what is possible by current production compilers. Unlike previous works,
our approach is an end-to-end fully automatic one driven by an integer linear optimization …
transformation framework that can optimize regular programs (sequences of possibly
imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we
show the practicality of analytical model-driven automatic transformation in the polyhedral
model--far beyond what is possible by current production compilers. Unlike previous works,
our approach is an end-to-end fully automatic one driven by an integer linear optimization …
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
ACM Digital Library