Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3178433.3178440acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

A Data Layout Transformation for Vectorizing Compilers

Published: 24 February 2018 Publication History

Abstract

Modern processors are often equipped with vector instruction sets. Such instructions operate on multiple elements of data at once, and greatly improve performance for specific applications. A programmer has two options to take advantage of these instructions: writing manually vectorized code, or using an auto-vectorizing compiler. In the latter case, he only has to place annotations to instruct the auto-vectorizing compiler to vectorize a particular piece of code. Thanks to auto-vectorization, the source program remains portable, and the programmer can focus on the task at hand instead of the low-level details of intrinsics programming. However, the performance of the vectorized program strongly depends on the precision of the analyses performed by the vectorizing compiler. In this paper, we improve the precision of these analyses by selectively splitting stack-allocated variables of a structure or aggregate type. Without this optimization, automatic vectorization slows the execution down compared to the scalar, non-vectorized code. When this optimization is enabled, we show that the vectorized code can be as fast as hand-optimized, manually vectorized implementations.

References

[1]
John R. Allen, Ken Kennedy, Carrie Porterfield, and Joe D. Warren. 1983. Conversion of Control Dependence to Data Dependence. In Conference Record of the Tenth Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, USA, January 1983. 177--189.
[2]
Randy Allen and Ken Kennedy. 1987. Automatic Translation of Fortran Programs to Vector Form. ACM Trans. Program. Lang. Syst. 9, 4 (1987), 491--542.
[3]
Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann.
[4]
W. Paul Cockshott. 2002. Vector Pascal an array language for multimedia code. In APL. 83--91.
[5]
Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, and Wagner Meira Jr. 2011. Divergence Analysis and Optimizations. In 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, Galveston, TX, USA, October 10-14, 2011. 320--329.
[6]
Adin D. Falkoff and Kenneth E. Iverson. 1973. The Design of APL. IBM Journal of Research and Development 17, 5 (1973), 324--334.
[7]
Paul Feautrier. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 1 (1991), 23--53.
[8]
Michael Haidl, Simon Moll, Lars Klein, Huihui Sun, Sebastian Hack, and Sergei Gorlatch. 2017. PACXXv2 + RV: An LLVM-based Portable High-Performance Programming Model. In Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, LLVM-HPC@SC 2017, Denver, CO, USA, November 13, 2017. 7:1--7:12.
[9]
Intel Corporation. 2013. Intel® Cilk™ Plus Language Extension Specification (version 1.2 ed.).
[10]
Ralf Karrenberg. 2015. Automatic SIMD Vectorization of SSA-based Control Flow Graphs. Springer.
[11]
Ralf Karrenberg and Sebastian Hack. 2011. Whole-function vectorization. In Proceedings of the CGO 2011, The 9th International Symposium on Code Generation and Optimization, Chamonix, France, April 2-6, 2011. 141--150.
[12]
Ralf Karrenberg and Sebastian Hack. 2012. Improving Performance of OpenCL on CPUs. In Compiler Construction - 21st International Conference, CC 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24-April 1, 2012. Proceedings. 1--20.
[13]
Samuel Larsen and Saman P. Amarasinghe. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Vancouver, Britith Columbia, Canada, June 18-21, 2000. 145--156.
[14]
Yunsup Lee, Ronny Krashinsky, Vinod Grover, Stephen W. Keckler, and Krste Asanovic. 2013. Convergence and scalarization for data-parallel architectures. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, Shenzhen, China, February 23-27, 2013. 32:1--32:11.
[15]
Chris J. Newburn, Byoungro So, Zhenying Liu, Michael D. McCool, Anwar M. Ghuloum, StefanusDu Toit, Zhi-Gang Wang, Zhaohui Du, Yongjian Chen, Gansha Wu, Peng Guo, Zhanglin Liu, and Dan Zhang. 2011. Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the CGO 2011, The 9th International Symposium on Code Generation and Optimization, Chamonix, France, April 2-6, 2011. 224--235.
[16]
Viet Nhu Ngo. 1995. Parallel Loop Transformation Techniques for Vector-based Multiprocessor Systems. Ph.D. Dissertation. Minneapolis, MN, USA. UMI Order No. GAX94-33091.
[17]
Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks. 2011. Vapor SIMD: Auto-vectorize once, run everywhere. In Proceedings of the CGO 2011, The 9th International Symposium on Code Generation and Optimization, Chamonix, France, April 2-6, 2011. 151--160.
[18]
Dorit Nuzman and Richard Henderson. 2006. Multi-platform Auto-vectorization. In Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 26-29 March 2006, New York, New York, USA. 281--294.
[19]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of interleaved data for SIMD. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, 2006. 132--143.
[20]
Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: revisited for short SIMD architectures. In 17th International Conference on Parallel Architecture and Compilation Techniques, PACT 2008, Toronto, Ontario, Canada, October 25-29, 2008. 2--11.
[21]
Vasileios Porpodas and Timothy M. Jones. 2015. Throttling Automatic Vectorization: When Less is More. In 2015 International Conference on Parallel Architecture and Compilation, PACT 2015, San Francisco, CA, USA, October 18-21, 2015. 432--444.
[22]
Vasileios Porpodas, Alberto Magni, and Timothy M. Jones. 2015. PSLP: padded SLP automatic vectorization. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, San Francisco, CA, USA, February 07-11, 2015. 190--201.
[23]
Jaewook Shin, Mary W. Hall, and Jacqueline Chame. 2005. Superword-Level Parallelism in the Presence of Control Flow. In 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 20-23 March 2005, San Jose, CA, USA. 165--175.
[24]
Artjoms Sinkarovs and Sven-Bodo Scholz. 2013. Semantics-preserving data layout transformations for improved vectorisation. In Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing, Boston, MA, USA, FHPC@ICFP 2013, September 25-27, 2013. 59--70.
[25]
Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-Model Guided Loop-Nest Auto-Vectorization. In PACT 2009, Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, 12-16 September 2009, Raleigh, North Carolina, USA. 327--337.
[26]
Shixiong Xu and David Gregg. 2014. Semi-automatic Composition of Data Layout Transformations for Loop Vectorization. In Network and Parallel Computing -11th IFIP WG 10.3 International Conference, NPC 2014, Ilan, Taiwan, September 18-20, 2014. Proceedings. 485--496.

Cited By

View all
  • (2020)Data Layout Transformation for Stencil Computations Using ARM NEON Extension2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00023(180-188)Online publication date: Dec-2020
  • (2019)Toward a graph-based dependence analysis framework for high level design verificationProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3323433(308-316)Online publication date: 30-Apr-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WPMVP'18: Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing
February 2018
68 pages
ISBN:9781450356466
DOI:10.1145/3178433
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compiler
  2. Optimization
  3. Vectorization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Bundesministerium für Bildung und Forschung

Conference

PPoPP '18

Acceptance Rates

WPMVP'18 Paper Acceptance Rate 8 of 12 submissions, 67%;
Overall Acceptance Rate 20 of 30 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Data Layout Transformation for Stencil Computations Using ARM NEON Extension2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00023(180-188)Online publication date: Dec-2020
  • (2019)Toward a graph-based dependence analysis framework for high level design verificationProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3323433(308-316)Online publication date: 30-Apr-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media