Abstract
Pragmas for loop transformations, such as unrolling, are implemented in most mainstream compilers. They are used by application programmers because of their ease of use compared to directly modifying the source code of the relevant loops. We propose additional pragmas for common loop transformations that go far beyond the transformations today’s compilers provide and should make most source rewriting for the sake of loop optimization unnecessary. To encourage compilers to implement these pragmas, and to avoid a diversity of incompatible syntaxes, we would like to spark a discussion about an inclusion to the OpenMP standard.
The U.S. government retains certain licensing rights. This is a U.S. government work and certain licensing rights apply
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bagnères, L., Zinenko, O., Huot, S., Bastoul, C.: Opening polyhedral compiler’s black box. In: 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2016). IEEE (2016)
Attributes in Clang. http://clang.llvm.org/docs/AttributeReference.html
Auto-Vectorization in LLVM. http://llvm.org/docs/Vectorizers.html
Clang Language Extensions. http://clang.llvm.org/docs/LanguageExtensions.html
Doerfert, J.: [RFC] abstract parallel IR optimizations. llvm-dev mailing list post, June 2018. http://lists.llvm.org/pipermail/llvm-dev/2018-June/123841.html
Dolbeau, R., Bihan, S., Bodin, F.: HMPP™: a hybrid multi-core parallel programming environment. In: First Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007) (2007)
Donadio, S., et al.: A language for the compact representation of multiple program versions. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 136–151. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-69330-7_10
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)
Finkel, H., Doerfert, J., Tian, X., Stelle, G.: A parallel IR in real life: optimizing OpenMP. EuroLLVM 2018 presentation (2018). http://llvm.org/devmtg/2018-04/talks.html#Talk_1
Finkel, H., Tian, X.: [RFC] IR-level region annotations. llvm-dev mailing list post, January 2017. http://lists.llvm.org/pipermail/llvm-dev/2017-January/108906.html
Free Software Foundation: Loop-Specific Pragmas. https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html
Girbal, S., et al.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (2006)
Grosser, T., Zheng, H., Aloor, R., Simbürger, A., Größlinger, A., Pouchet, L.N.: Polly - polyhedral optimization in LLVM. In: First International Workshop on Polyhedral Compilation Techniques (IMPACT 2011) (2011)
Hartono, A., Norris, B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio. In: Proceedings of the 23rd IEEE International Parallel and Distributed Computing Symposium (IPDPS 2009). IEEE (2009)
Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. Technical report LLNL-TR-661403, Lawrence Livermore National Lab (2014)
IBM: Product documentation for XL C/C++ for AIX, V13.1.3
Intel: Threading Building Blocks. https://www.threadingbuildingblocks.org
Intel: Intel C++ Compiler 18.0 Developer Guide and Reference, May 2018
International Organization for Standardization: ISO/IEC 14882:2017, December 2017
Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Technical report UMIACS-TR-93-134/CS-TR-3193, University of Maryland (1992)
Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. Trans. Math. Softw. (TOMS) 43(2), 12:1–12:18 (2016)
Microsoft: C/C++ Preprocessor Reference. http://docs.microsoft.com/en-us/cpp/preprocessor/loop
Müller-Pfefferkorn, R., Nagel, W.E., Trenkler, B.: Optimizing cache access: a tool for source-to-source transformations and real-life compiler tests. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 72–81. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27866-5_10
OpenACC-Standard.org: The OpenACC Application Programming Interface Version 4.0, November 2017
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.0, July 2017
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2013), pp. 519–530. ACM (2013)
Saito, H.: Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop auto-vectorization. EuroLLVM 2018 presentation (2016). http://llvm.org/devmtg/2016-11/#talk7
Schardl, T.B., Moses, W.S., Leiserson, C.E.: Tapir: embedding fork-join parallelism into LLVM’s intermediate representation. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2017), pp. 249–265. ACM (2017)
Tian, X., et al.: LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading. In: Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC 2016). IEEE (2016)
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: Proceedings of the 23rd IEEE International Parallel and Distributed Computing Symposium (IPDPS 2009). IEEE (2009)
Vasilache, N., et al.: Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR abs/1802.04730 (2018)
Verdoolaege, S., Guelton, S., Grosser, T., Cohen, A.: Schedule trees. In: Fourth International Workshop on Polyhedral Compilation Techniques (IMPACT 2014) (2014)
Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: POET: parameterized optimizations for empirical tuning. In: Proceedings of the 21st IEEE International Parallel And Distributed Computing Symposium (IPDPS 2007). IEEE (2007)
Zinenko, O., Huot, S., Bastoul, C.: Clint: a direct manipulation tool for parallelizing compute-intensive program parts. In: 2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE (2014)
Acknowledgments
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nations exascale computing imperative.
This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kruse, M., Finkel, H. (2018). A Proposal for Loop-Transformation Pragmas. In: de Supinski, B., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds) Evolving OpenMP for Evolving Architectures. IWOMP 2018. Lecture Notes in Computer Science(), vol 11128. Springer, Cham. https://doi.org/10.1007/978-3-319-98521-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-98521-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98520-6
Online ISBN: 978-3-319-98521-3
eBook Packages: Computer ScienceComputer Science (R0)