A Proposal for Loop-Transformation Pragmas

Kruse, Michael; Finkel, Hal

doi:10.1007/978-3-319-98521-3_3

Michael Kruse¹⁸ &
Hal Finkel¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11128))

Included in the following conference series:

International Workshop on OpenMP

853 Accesses
3 Citations
1 Altmetric

Abstract

Pragmas for loop transformations, such as unrolling, are implemented in most mainstream compilers. They are used by application programmers because of their ease of use compared to directly modifying the source code of the relevant loops. We propose additional pragmas for common loop transformations that go far beyond the transformations today’s compilers provide and should make most source rewriting for the sake of loop optimization unnecessary. To encourage compilers to implement these pragmas, and to avoid a diversity of incompatible syntaxes, we would like to spark a discussion about an inclusion to the OpenMP standard.

The U.S. government retains certain licensing rights. This is a U.S. government work and certain licensing rights apply

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Loopy: Programmable and Formally Verified Loop Transformations

Design and Use of Loop-Transformation Pragmas

Distributing and Parallelizing Non-canonical Loops

References

Bagnères, L., Zinenko, O., Huot, S., Bastoul, C.: Opening polyhedral compiler’s black box. In: 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2016). IEEE (2016)
Google Scholar
Attributes in Clang. http://clang.llvm.org/docs/AttributeReference.html
Auto-Vectorization in LLVM. http://llvm.org/docs/Vectorizers.html
Clang Language Extensions. http://clang.llvm.org/docs/LanguageExtensions.html
Doerfert, J.: [RFC] abstract parallel IR optimizations. llvm-dev mailing list post, June 2018. http://lists.llvm.org/pipermail/llvm-dev/2018-June/123841.html
Dolbeau, R., Bihan, S., Bodin, F.: HMPP™: a hybrid multi-core parallel programming environment. In: First Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007) (2007)
Google Scholar
Donadio, S., et al.: A language for the compact representation of multiple program versions. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 136–151. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-69330-7_10
Chapter Google Scholar
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)
Article Google Scholar
Finkel, H., Doerfert, J., Tian, X., Stelle, G.: A parallel IR in real life: optimizing OpenMP. EuroLLVM 2018 presentation (2018). http://llvm.org/devmtg/2018-04/talks.html#Talk_1
Finkel, H., Tian, X.: [RFC] IR-level region annotations. llvm-dev mailing list post, January 2017. http://lists.llvm.org/pipermail/llvm-dev/2017-January/108906.html
Free Software Foundation: Loop-Specific Pragmas. https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html
Girbal, S., et al.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (2006)
Article Google Scholar
Grosser, T., Zheng, H., Aloor, R., Simbürger, A., Größlinger, A., Pouchet, L.N.: Polly - polyhedral optimization in LLVM. In: First International Workshop on Polyhedral Compilation Techniques (IMPACT 2011) (2011)
Google Scholar
Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: Proceedings of the 23rd IEEE International Parallel and Distributed Computing Symposium (IPDPS 2009). IEEE (2009)
Google Scholar
Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. Technical report LLNL-TR-661403, Lawrence Livermore National Lab (2014)
Google Scholar
IBM: Product documentation for XL C/C++ for AIX, V13.1.3
Google Scholar
Intel: Threading Building Blocks. https://www.threadingbuildingblocks.org
Intel: Intel C++ Compiler 18.0 Developer Guide and Reference, May 2018
Google Scholar
International Organization for Standardization: ISO/IEC 14882:2017, December 2017
Google Scholar
Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Technical report UMIACS-TR-93-134/CS-TR-3193, University of Maryland (1992)
Google Scholar
Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. Trans. Math. Softw. (TOMS) 43(2), 12:1–12:18 (2016)
MathSciNet MATH Google Scholar
Microsoft: C/C++ Preprocessor Reference. http://docs.microsoft.com/en-us/cpp/preprocessor/loop
Müller-Pfefferkorn, R., Nagel, W.E., Trenkler, B.: Optimizing cache access: a tool for source-to-source transformations and real-life compiler tests. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 72–81. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27866-5_10
Chapter Google Scholar
OpenACC-Standard.org: The OpenACC Application Programming Interface Version 4.0, November 2017
Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.0, July 2017
Google Scholar
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2013), pp. 519–530. ACM (2013)
Google Scholar
Saito, H.: Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop auto-vectorization. EuroLLVM 2018 presentation (2016). http://llvm.org/devmtg/2016-11/#talk7
Schardl, T.B., Moses, W.S., Leiserson, C.E.: Tapir: embedding fork-join parallelism into LLVM’s intermediate representation. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2017), pp. 249–265. ACM (2017)
Google Scholar
Tian, X., et al.: LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading. In: Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC 2016). IEEE (2016)
Google Scholar
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the 23rd IEEE International Parallel and Distributed Computing Symposium (IPDPS 2009). IEEE (2009)
Google Scholar
Vasilache, N., et al.: Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR abs/1802.04730 (2018)
Google Scholar
Verdoolaege, S., Guelton, S., Grosser, T., Cohen, A.: Schedule trees. In: Fourth International Workshop on Polyhedral Compilation Techniques (IMPACT 2014) (2014)
Google Scholar
Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: POET: parameterized optimizations for empirical tuning. In: Proceedings of the 21st IEEE International Parallel And Distributed Computing Symposium (IPDPS 2007). IEEE (2007)
Google Scholar
Zinenko, O., Huot, S., Bastoul, C.: Clint: a direct manipulation tool for parallelizing compute-intensive program parts. In: 2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE (2014)
Google Scholar

Download references

Acknowledgments

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nations exascale computing imperative.

This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Argonne Leadership Computing Facility, Argonne National Laboratory, Argonne, IL, 60439, USA
Michael Kruse & Hal Finkel

Authors

Michael Kruse
View author publications
You can also search for this author in PubMed Google Scholar
Hal Finkel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Kruse .

Editor information

Editors and Affiliations

Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Pedro Valero-Lara
Universitat Politècnica de Catalunya, Barcelona, Spain
Xavier Martorell
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Sergi Mateo Bellido
Universitat Politècnica de Catalunya, Barcelona, Barcelona, Spain
Jesus Labarta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kruse, M., Finkel, H. (2018). A Proposal for Loop-Transformation Pragmas. In: de Supinski, B., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds) Evolving OpenMP for Evolving Architectures. IWOMP 2018. Lecture Notes in Computer Science(), vol 11128. Springer, Cham. https://doi.org/10.1007/978-3-319-98521-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-98521-3_3
Published: 29 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98520-6
Online ISBN: 978-3-319-98521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Proposal for Loop-Transformation Pragmas

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Loopy: Programmable and Formally Verified Loop Transformations

Design and Use of Loop-Transformation Pragmas

Distributing and Parallelizing Non-canonical Loops

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Proposal for Loop-Transformation Pragmas

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Loopy: Programmable and Formally Verified Loop Transformations

Design and Use of Loop-Transformation Pragmas

Distributing and Parallelizing Non-canonical Loops

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation