research-article

Public Access

Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures

Authors:

Yonghong YanAuthors Info & Claims

PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores

Pages 11 - 20

https://doi.org/10.1145/3528425.3529100

Published: 18 April 2022 Publication History

Abstract

Over the past decade, SIMD (single instruction multiple data) or vector architectures have made significant advances, now existing across a wide range of devices from commodity CPUs to high performance computing (HPC) cores. Intel's AVX (Advanced Vector Extensions) architecture has been one of the most popular SIMD extensions to commodity and HPC CPUs from Intel. Over the past few years, Arm has made significant inroads with its new SVE (Scalable Vector Extension), used in the supercomputer of the top place on the Top500 list. As SIMD has become more advanced and more important, it has become equally important the compilers support these architecture extensions. In this paper, we present our approach of source-to-source compiler transformation of explicit vectorization constructs using the OpenMP SIMD directive. We present the design of a unified IR that is easily translated to AVX and SVE vector architectures. Finally, we conduct performance evaluations on Intel AVX and Arm SVE to demonstrate how this method of vectorization can bridge the gap between auto- and manual- vectorization.

References

[1]

[n. d.]. ARM C Language Extensions for SVE. https://developer.arm.com/documentation/100987/0000/

[2]

[n. d.]. Intel® AVX-512 Instructions. https://software.intel.com/content/www/cn/zh/develop/articles/intel-avx-512-instructions.html

[3]

Mathias Gottschlag, Philipp Machauer, Yussuf Khalil, and Frank Bellosa. 2021. Fair Scheduling for AVX2 and AVX-512 Workloads. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 745–758. https://www.usenix.org/conference/atc21/presentation/gottschlag

[4]

Ameer Haj-Ali, Nesreen K Ahmed, Ted Willke, Yakun Sophia Shao, Krste Asanovic, and Ion Stoica. 2020. NeuroVectorizer: End-to-end vectorization with deep reinforcement learning. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. 242–255.

Digital Library

[5]

Ken Kennedy and Gerald Roth. 1994. Context optimization for SIMD execution. In Proceedings of IEEE Scalable High Performance Computing Conference. IEEE, 445–453.

[6]

Michael Klemm, Alejandro Duran, Xinmin Tian, Hideki Saito, Diego Caballero, and Xavier Martorell. 2012. Extending OpenMP* with vector constructs for modern multicore SIMD architectures. In International Workshop on OpenMP. Springer, 59–72.

Digital Library

[7]

Olaf Krzikalla, Kim Feldhoff, Ralph Müller-Pfefferkorn, and Wolfgang E Nagel. 2011. Scout: a source-to-source transformator for SIMD-optimizations. In European Conference on Parallel Processing. Springer, 137–145.

[8]

Matthew A. Lambert and B. David Saunders. 2017. Compiler Auto-Vectorization of Matrix Multiplication modulo Small Primes. In Proceedings of the International Workshop on Parallel Symbolic Computation (Kaiserslautern, Germany) (PASCO 2017). Association for Computing Machinery, New York, NY, USA, Article 7, 10 pages.

Digital Library

[9]

Dorit Nuzman and Richard Henderson. 2006. Multi-platform auto-vectorization. In International Symposium on Code Generation and Optimization (CGO'06). IEEE, 11–pp.

Digital Library

[10]

Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of interleaved data for SIMD. ACM SIGPLAN Notices 41, 6 (2006), 132–143.

[11]

Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: revisited for short simd architectures. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. 2–11.

Digital Library

[12]

Angela Pohl, Mirko Greese, Biagio Cosenza, and Ben Juurlink. 2019. A Performance Analysis of Vector Length Agnostic Code. In 2019 International Conference on High Performance Computing Simulation (HPCS). 159–164.

[13]

Oliver Reiche, Christof Kobylko, Frank Hannig, and Jürgen Teich. 2017. Auto-vectorization for image processing DSLs. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. 21–30.

Digital Library

[14]

Florian Wende, Matthias Noack, Thomas Steinke, Michael Klemm, Chris J Newburn, and Georg Zitzlsberger. 2016. Portable simd performance with openmp* 4. x compiler directives. In European Conference on Parallel Processing. Springer, 264–277.

Digital Library

[15]

Hong Zhang, Richard T. Mills, Karl Rupp, and Barry F. Smith. 2018. Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512. Association for Computing Machinery, New York, NY, USA.

Digital Library

Cited By

Mustafa DAlkhasawneh RObeidat FShatnawi A(2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3372990
Tahir HJung E(2023)Comparative Study on Distributed Lightweight Deep Learning Models for Road Pothole DetectionSensors10.3390/s2309434723:9(4347)Online publication date: 27-Apr-2023
https://doi.org/10.3390/s23094347
Bastidas Fuertes APérez MMeza Hormaza J(2023)Transpilers: A Systematic Mapping Review of Their Usage in Research and IndustryApplied Sciences10.3390/app1306366713:6(3667)Online publication date: 13-Mar-2023
https://doi.org/10.3390/app13063667
Show More Cited By

Index Terms

Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures

Recommendations

Performance Evaluation and Improvements of the PoCL Open-Source OpenCL Implementation on Intel CPUs
IWOCL '21: Proceedings of the 9th International Workshop on OpenCL

The Portable Computing Language (PoCL) is a vendor independent open-source OpenCL implementation that aims to support a variety of compute devices in a single platform. Evaluating PoCL versus the Intel OpenCL implementation reveals significant ...
Boundary element quadrature schemes for multi- and many-core architectures

In the paper we study the performance of the regularized boundary element quadrature routines implemented in the BEM4I library developed by the authors. Apart from the results obtained on the classical multi-core architecture represented by the Intel ...
Automated Compiler Optimization of Multiple Vector Loads/Stores

With widening vectors and the proliferation of advanced vector instructions in today's processors, vectorization plays an ever-increasing role in delivering application performance. Achieving the performance potential of this vector hardware has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores

April 2022

68 pages

ISBN:9781450393393

DOI:10.1145/3528425

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

PPoPP '22

Sponsor:

PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

April 2 - 6, 2022

Seoul, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
390
Total Downloads

Downloads (Last 12 months)221
Downloads (Last 6 weeks)35

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mustafa DAlkhasawneh RObeidat FShatnawi A(2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3372990
Tahir HJung E(2023)Comparative Study on Distributed Lightweight Deep Learning Models for Road Pothole DetectionSensors10.3390/s2309434723:9(4347)Online publication date: 27-Apr-2023
https://doi.org/10.3390/s23094347
Bastidas Fuertes APérez MMeza Hormaza J(2023)Transpilers: A Systematic Mapping Review of Their Usage in Research and IndustryApplied Sciences10.3390/app1306366713:6(3667)Online publication date: 13-Mar-2023
https://doi.org/10.3390/app13063667
Liu HDeng F(2023)Convolutional Acceleration Algorithm Combining Loop Optimization and Automatic Scheduling2023 International Conference for Advancement in Technology (ICONAT)10.1109/ICONAT57137.2023.10080410(1-7)Online publication date: 24-Jan-2023
https://doi.org/10.1109/ICONAT57137.2023.10080410
Pons LPetit SPons JGómez MSahuquillo J(2023)A modular approach to build a hardware testbed for cloud resource management researchThe Journal of Supercomputing10.1007/s11227-023-05856-280:8(10552-10583)Online publication date: 27-Dec-2023
https://dl.acm.org/doi/10.1007/s11227-023-05856-2

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents