research-article

Open access

Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable

Authors:

Christophe Guillon,

Louis-Noël Pouchet,

Fabrice RastelloAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 4

Article No.: 45, Pages 1 - 26

https://doi.org/10.1145/3363785

Published: 17 December 2019 Publication History

All formats PDF

Abstract

The polyhedral model has been successfully used in production compilers. Nevertheless, only a very restricted class of applications can benefit from it. Recent proposals investigated how runtime information could be used to apply polyhedral optimization on applications that do not statically fit the model. In this work, we go one step further in that direction. We propose the folding-based analysis that, from the output of an instrumented program execution, builds a compact polyhedral representation. It is able to accurately detect affine dependencies, fixed-stride memory accesses, and induction variables in programs. It scales to real-life applications, which often include some nonaffine dependencies and accesses in otherwise affine code. This is enabled by a safe fine-grained polyhedral overapproximation mechanism. We evaluate our analysis on the entire Rodinia benchmark suite, enabling accurate feedback about the potential for complex polyhedral transformations.

References

[1]

Péricles Alves, Fabian Gruber, Johannes Doerfert, Alexandros Lamprineas, Tobias Grosser, Fabrice Rastello, and Fernando Magno Quintão Pereira. 2015. Runtime pointer disambiguation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’15). ACM.

Digital Library

[2]

Ran Ao, Guangming Tan, and Mingyu Chen. 2013. ParaInsight: An assistant for quantitatively analyzing multi-granularity parallel region. In 2013 IEEE 10th International Conference on High Performance Computing and Communications 8 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC’13). IEEE.

[3]

Cédric Bastoul. 2004. Generating loops for scanning polyhedra: Cloog users guide. Polyhedron 2 (2004).

[4]

Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05).

Digital Library

[5]

Erik Berg and Erik Hagersten. 2005. Fast data-locality profiling of native execution. In ACM SIGMETRICS Performance Evaluation Review. ACM.

[6]

Kristof Beyls and Erik D’Hollander. 2006. Discovery of locality-improving refactorings by reuse path analysis. High Performance Computing and Communications 4208 (2006), 220--229.

Digital Library

[7]

Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). ACM.

[8]

G. S. Brodal and R. Jacob. 2002. Dynamic planar convex hull. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002, Proceedings.

[9]

Khansa Butt, Abdul Qadeer, Ghulam Mustafa, and Abdul Waheed. 2012. Runtime analysis of application binaries for function level parallelism potential using QEMU. In 2012 International Conference on Open Source Systems and Technologies (ICOSST’12). IEEE.

[10]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization, 2009 (IISWC’09).

Digital Library

[11]

Shuai Che, Jeremy W. Sheaffer, Michael Boyer, Lukasz G. Szafaryn, Liang Wang, and Kevin Skadron. 2010. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’10). IEEE Computer Society.

Digital Library

[12]

Jean-François Collard, Denis Barthou, and Paul Feautrier. 1995. Fuzzy array dataflow analysis. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’95). ACM.

Digital Library

[13]

Johannes Doerfert, Tobias Grosser, and Sebastian Hack. 2017. Optimistic loop optimization. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO’17). IEEE Press.

Digital Library

[14]

Karl-Filip Faxén, Konstantin Popov, Sverker Jansson, and Lars Albertsson. 2008. Embla - data dependence profiling for parallel programming. In Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS’08). IEEE Computer Society.

Digital Library

[15]

Paul Feautrier. 1988. Parametric integer programming. RAIRO-Operations Research 22, 3 (1988), 243--268.

[16]

Paul Feautrier and Christian Lengauer. 2011. Polyhedron model. In Encyclopedia of Parallel Computing. Springer.

[17]

Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters 22, 4 (2012). https://www.worldscientific.com/doi/10.1142/S0129626412500107.

[18]

Fabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Antoine Moynault, Louis-Noël Pouchet, and Fabrice Rastello. Python implementation of the folding based analysis. Retrieved from https://gitlab.inria.fr/fgruber/python-folding.

[19]

Fabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Antoine Moynault, Louis-Noël Pouchet, and Fabrice Rastello. 2019. Data-flow/dependence profiling for structured transformations. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’19).

Digital Library

[20]

Fabian Gruber, Manuel Selva, Diogo Sampaio, Christophe Guillon, Louis-Noël Pouchet, and Fabrice Rastello. 2019. Building of a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Non-Affine Programs Scalable. Research Report RR-9244. Retrieved from https://hal.inria.fr/hal-01967828.

[21]

Christophe Guillon. 2011. Program instrumentation with QEMU. In Proceedings of the International QEMU User’s Forum (QUF’11).

[22]

John Hershberger and Subhash Suri. 2003. Convex hulls and related problems in data streams. In Proceedings of the ACM/DIMACS Workshop on Management and Processing of Data Streams.

[23]

Justin Holewinski, Ragavendar Ramamurthi, Mahesh Ravishankar, Naznin Fauzia, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2012. Dynamic trace-based analysis of vectorization potential of applications. ACM SIGPLAN Notices 47, 6 (2012).

[24]

Alain Ketterlin and Philippe Clauss. 2008. Prediction and trace compression of data access addresses through nested loop recognition. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). ACM.

Digital Library

[25]

Alain Ketterlin and Philippe Clauss. 2012. Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society.

Digital Library

[26]

Minjang Kim, Hyesoon Kim, and Chi-Keung Luk. 2010. Prospector: A dynamic data-dependence profiler to help parallel programming. In HotParâ10: Proceedings of the USENIX Workshop on Hot Topics in Parallelism.

[27]

Zhen Li, Rohit Atre, Zia Ul-Huda, Ali Jannesari, and Felix Wolf. 2015. DiscoPoP: A profiling tool to identify parallelization opportunities. In Tools for High Performance Computing 2014. Springer.

[28]

Xu Liu and John Mellor-Crummey. 2011. Pinpointing data locality problems using data-centric analysis. In 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE.

[29]

G. Marin, J. Dongarra, and D. Terpstra. 2014. MIAMI: A framework for application performance diagnosis. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14).

[30]

Juan Manuel Martinez Caamaño, Manuel Selva, Philippe Clauss, Artyom Baloian, and Willy Wolff. 2017. Full runtime polyhedral optimizing loop transformations with the generation, instantiation, and scheduling of code-bones. Concurrency and Computation: Practice and Experience 29, 15 (2017). e4192 cpe.4192.

[31]

Nicholas Nethercote and Alan Mycroft. 2003. Redux: A dynamic dataflow tracer. Electronic Notes in Theoretical Computer Science 89, 2 (2003), 149--170.

[32]

Catherine Mills Olschanowsky, Mustafa M. Tikir, Laura Carrington, and Allan Snavely. 2010. PSnAP: Accurate synthetic address streams through memory profiles. In Languages and Compilers for Parallel Computing, Guang R. Gao, Lori L. Pollock, John Cavazos, and Xiaoming Li (Eds.). Springer, Berlin.

[33]

Sebastian Pop, Albert Cohen, and Georges-André Silber. 2005. Induction variable analysis with delayed abstractions. In Proceedings of the 1st International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’05).

Digital Library

[34]

Louis-Noël Pouchet. 2019. The PoCC polyhedral compiler collection. Retrieved from http://pocc.sourceforge.net.

[35]

Gabriel Rodríguez, José M. Andión, Mahmut T. Kandemir, and Juan Touriño. 2016. Trace-based affine reconstruction of codes. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO’16). ACM.

Digital Library

[36]

Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. 2003. Hybrid analysis: Static 8 dynamic memory reference analysis. International Journal of Parallel Programming 31, 4 (Aug. 2003), 251--283.

Digital Library

[37]

Diogo N. Sampaio, Louis-Noël Pouchet, and Fabrice Rastello. 2017. Simplification and runtime resolution of data dependence constraints for loop transformations. In Proceedings of the International Conference on Supercomputing (ICS’17). ACM.

Digital Library

[38]

Andreas Simbürger, Sven Apel, Armin Größlinger, and Christian Lengauer. 2018. PolyJIT: Polyhedral optimization just in time. International Journal of Parallel Programming (Aug. 2018).

[39]

Aravind Sukumaran-Rajam. 2015. Beyond the Realm of the Polyhedral Model: Combining Speculative Program Parallelization with Polyhedral Compilation. Theses. Université de Strasbourg. Retrieved from https://hal.inria.fr/tel-01251748.

[40]

Aravind Sukumaran-Rajam and Philippe Clauss. 2015. The polyhedral model of nonlinear loops. ACM Transactions on Architecture and Code Optimization 12, 4 (Dec. 2015), 27.

Digital Library

[41]

Georgios Tournavitis and Björn Franke. 2010. Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM.

Digital Library

[42]

Konrad Trifunovic, Albert Cohen, David Edelsohn, Feng Li, Tobias Grosser, Harsha Jagasia, Razya Ladelsky, Sebastian Pop, Jan Sjödin, and Ramakrishna Upadrasta. 2010. GRAPHITE two years after: First lessons learned from real-world polyhedral compilation. In GCC Research Opportunities Workshop (GROW’10). ACM.

[43]

Robert A. Van Engelen. 2001. Efficient symbolic analysis for optimizing compilers. In International Conference on Compiler Construction. Springer.

Digital Library

[44]

Hans Vandierendonck, Sean Rul, and Koen De Bosschere. 2010. The paralax infrastructure: Automatic parallelization with a helping hand. In 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM.

Digital Library

[45]

Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization 9, 4 (Jan. 2013), 23.

Digital Library

[46]

Zheng Wang, Georgios Tournavitis, Björn Franke, and Michael F. P. O’Boyle. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM Transactions on Architecture and Code Optimization (TACO) 11, 1 (2014), 26.

Cited By

Cheshmi KCetinic ZDehnavi MWolf FShende SCulhane CAlam SJagode H(2022)Vectorizing sparse matrix computations with partially-strided codeletsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571927(1-15)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571927
Gerard BGrosser TKong MEgger BSmith A(2022)QRANE: lifting QASM programs to an affine IRProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517775(15-28)Online publication date: 19-Mar-2022
https://dl.acm.org/doi/10.1145/3497776.3517775
Cheshmi KCetinic ZDehnavi M(2022)Vectorizing Sparse Matrix Computations with Partially-Strided CodeletsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00037(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00037
Show More Cited By

Index Terms

Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
Data-flow/dependence profiling for structured transformations
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

Profiling feedback is an important technique used by developers for performance debugging, where it is usually used to pinpoint performance bottlenecks and also to find optimization opportunities. Assessing the validity and potential benefit of a ...
A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 16, Issue 4

December 2019

572 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3366460

Editor:
Koen De Bosschere
Ghent University, Belgium

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2019

Accepted: 01 September 2019

Revised: 01 August 2019

Received: 01 February 2019

Published in TACO Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

U.S. National Science Foundation
French program Investissement d'avenir
LabEx PERSYVAL-Lab

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
697
Total Downloads

Downloads (Last 12 months)128
Downloads (Last 6 weeks)14

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheshmi KCetinic ZDehnavi MWolf FShende SCulhane CAlam SJagode H(2022)Vectorizing sparse matrix computations with partially-strided codeletsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571927(1-15)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571927
Gerard BGrosser TKong MEgger BSmith A(2022)QRANE: lifting QASM programs to an affine IRProceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction10.1145/3497776.3517775(15-28)Online publication date: 19-Mar-2022
https://dl.acm.org/doi/10.1145/3497776.3517775
Cheshmi KCetinic ZDehnavi M(2022)Vectorizing Sparse Matrix Computations with Partially-Strided CodeletsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00037(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00037
Niu WGuan JWang YAgrawal GRen BFreund SYahav E(2021)DNNFusion: accelerating deep neural networks execution with advanced operator fusionProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454083(883-898)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454083
Morihata ASato SFreund SYahav E(2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454079

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents