Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3528425.3529100acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures

Published: 18 April 2022 Publication History

Abstract

Over the past decade, SIMD (single instruction multiple data) or vector architectures have made significant advances, now existing across a wide range of devices from commodity CPUs to high performance computing (HPC) cores. Intel's AVX (Advanced Vector Extensions) architecture has been one of the most popular SIMD extensions to commodity and HPC CPUs from Intel. Over the past few years, Arm has made significant inroads with its new SVE (Scalable Vector Extension), used in the supercomputer of the top place on the Top500 list. As SIMD has become more advanced and more important, it has become equally important the compilers support these architecture extensions. In this paper, we present our approach of source-to-source compiler transformation of explicit vectorization constructs using the OpenMP SIMD directive. We present the design of a unified IR that is easily translated to AVX and SVE vector architectures. Finally, we conduct performance evaluations on Intel AVX and Arm SVE to demonstrate how this method of vectorization can bridge the gap between auto- and manual- vectorization.

References

[1]
[n. d.]. ARM C Language Extensions for SVE. https://developer.arm.com/documentation/100987/0000/
[2]
[n. d.]. Intel® AVX-512 Instructions. https://software.intel.com/content/www/cn/zh/develop/articles/intel-avx-512-instructions.html
[3]
Mathias Gottschlag, Philipp Machauer, Yussuf Khalil, and Frank Bellosa. 2021. Fair Scheduling for AVX2 and AVX-512 Workloads. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 745–758. https://www.usenix.org/conference/atc21/presentation/gottschlag
[4]
Ameer Haj-Ali, Nesreen K Ahmed, Ted Willke, Yakun Sophia Shao, Krste Asanovic, and Ion Stoica. 2020. NeuroVectorizer: End-to-end vectorization with deep reinforcement learning. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization. 242–255.
[5]
Ken Kennedy and Gerald Roth. 1994. Context optimization for SIMD execution. In Proceedings of IEEE Scalable High Performance Computing Conference. IEEE, 445–453.
[6]
Michael Klemm, Alejandro Duran, Xinmin Tian, Hideki Saito, Diego Caballero, and Xavier Martorell. 2012. Extending OpenMP* with vector constructs for modern multicore SIMD architectures. In International Workshop on OpenMP. Springer, 59–72.
[7]
Olaf Krzikalla, Kim Feldhoff, Ralph Müller-Pfefferkorn, and Wolfgang E Nagel. 2011. Scout: a source-to-source transformator for SIMD-optimizations. In European Conference on Parallel Processing. Springer, 137–145.
[8]
Matthew A. Lambert and B. David Saunders. 2017. Compiler Auto-Vectorization of Matrix Multiplication modulo Small Primes. In Proceedings of the International Workshop on Parallel Symbolic Computation (Kaiserslautern, Germany) (PASCO 2017). Association for Computing Machinery, New York, NY, USA, Article 7, 10 pages.
[9]
Dorit Nuzman and Richard Henderson. 2006. Multi-platform auto-vectorization. In International Symposium on Code Generation and Optimization (CGO'06). IEEE, 11–pp.
[10]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of interleaved data for SIMD. ACM SIGPLAN Notices 41, 6 (2006), 132–143.
[11]
Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: revisited for short simd architectures. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. 2–11.
[12]
Angela Pohl, Mirko Greese, Biagio Cosenza, and Ben Juurlink. 2019. A Performance Analysis of Vector Length Agnostic Code. In 2019 International Conference on High Performance Computing Simulation (HPCS). 159–164.
[13]
Oliver Reiche, Christof Kobylko, Frank Hannig, and Jürgen Teich. 2017. Auto-vectorization for image processing DSLs. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. 21–30.
[14]
Florian Wende, Matthias Noack, Thomas Steinke, Michael Klemm, Chris J Newburn, and Georg Zitzlsberger. 2016. Portable simd performance with openmp* 4. x compiler directives. In European Conference on Parallel Processing. Springer, 264–277.
[15]
Hong Zhang, Richard T. Mills, Karl Rupp, and Barry F. Smith. 2018. Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512. Association for Computing Machinery, New York, NY, USA.

Cited By

View all
  • (2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
  • (2023)Comparative Study on Distributed Lightweight Deep Learning Models for Road Pothole DetectionSensors10.3390/s2309434723:9(4347)Online publication date: 27-Apr-2023
  • (2023)Transpilers: A Systematic Mapping Review of Their Usage in Research and IndustryApplied Sciences10.3390/app1306366713:6(3667)Online publication date: 13-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores
April 2022
68 pages
ISBN:9781450393393
DOI:10.1145/3528425
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Intel AVX2, AVX-512
  2. OpenMP
  3. SIMD
  4. arm SVE
  5. compiler transformation
  6. vectorization

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '22

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)221
  • Downloads (Last 6 weeks)35
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
  • (2023)Comparative Study on Distributed Lightweight Deep Learning Models for Road Pothole DetectionSensors10.3390/s2309434723:9(4347)Online publication date: 27-Apr-2023
  • (2023)Transpilers: A Systematic Mapping Review of Their Usage in Research and IndustryApplied Sciences10.3390/app1306366713:6(3667)Online publication date: 13-Mar-2023
  • (2023)Convolutional Acceleration Algorithm Combining Loop Optimization and Automatic Scheduling2023 International Conference for Advancement in Technology (ICONAT)10.1109/ICONAT57137.2023.10080410(1-7)Online publication date: 24-Jan-2023
  • (2023)A modular approach to build a hardware testbed for cloud resource management researchThe Journal of Supercomputing10.1007/s11227-023-05856-280:8(10552-10583)Online publication date: 27-Dec-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media