Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Improving Loop Dependence Analysis

Published: 22 August 2017 Publication History

Abstract

Programmers can no longer depend on new processors to have significantly improved single-thread performance. Instead, gains have to come from other sources such as the compiler and its optimization passes. Advanced passes make use of information on the dependencies related to loops. We improve the quality of that information by reusing the information given by the programmer for parallelization. We have implemented a prototype based on GCC into which we also add a new optimization pass. Our approach improves the amount of correctly classified dependencies resulting in 46% average improvement in single-thread performance for kernel benchmarks compared to GCC 6.1.

Supplementary Material

TACO1403-22 (taco1403-22.pdf)
Slide deck associated with this paper

References

[1]
John Randal Allen. 1983. Dependence Analysis for Subscripted Variables and Its Application to Program Transformations. Ph.D. Dissertation. Rice University.
[2]
OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface (version 4.0). (2013). OpenMP Specification.
[3]
OpenMP Architecture Review Board. 2015. OpenMP Application Program Interface (version 4.5). (2015). OpenMP Specification.
[4]
Uday Bondhugula, Muthu Baskaran, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Compiler Construction.
[5]
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. 2000. A portable programming interface for performance evaluation on modern processors. Int. J. High Perf. Comput. Appl. 14, 3 (2000), 189--204.
[6]
Diego Caballero, Sara Royuela, Roger Ferrer, Alejandro Duran, and Xavier Martorell. 2015. Optimizing overlapped memory accesses in user-directed vectorization. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS’15).
[7]
D. Callahan, J. Dongarra, and D. Levine. 1988. Vectorizing compilers: A test suite and results. In Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (SC’88).
[8]
Prasanth Chatarasi, Jun Shirako, and Vivek Sarkar. 2015. Polyhedral optimizations of explicitly parallel programs. In 2015 International Conference on Parallel Architecture and Compilation (PACT’15).
[9]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC’09).
[10]
Gregory Diamos and Benjamin Ashbaugh. 2011. SIMD re-convergence at thread frontiers. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.
[11]
A.E. Eichenberger and A. Wang. 2005. Efficient SIMD code generation for runtime alignment and length conversion. In Proceedings of the International Symposium on Code Generation and Optimization.
[12]
Alexandre E. Eichenberger, Peng Wu, and Kevin O’Brien. 2004. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI’04).
[13]
Free Software Foundation. 2016. GNU Compiler Collection. Retrieved January 27, 2016 from http://gcc.gnu.org.
[14]
Gina Goff, Ken Kennedy, and Chau-Wen Tseng. 1991. Practical dependence testing. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI’91).
[15]
Intel. 2015a. Intel Architecture Instruction Set Extensions Programming Reference. Technical Report. Retrieved from http://download-software.intel.com/sites/default/files/319433-014.pdf.
[16]
Intel. 2015b. Intel 64 and IA-32 Architectures Software Developers Manual.
[17]
Intel. 2016. Intel Composer XE 2015. Retrieved January 27, 2016 from http://software.intel.com/en-us/intel-composer-xe.
[18]
Ralf Karrenberg and Sebastian Hack. 2011. Whole-function vectorization. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11).
[19]
Ken Kennedy and John R. Allen. 2002. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers Inc.
[20]
Michael Klemm, Alejandro Duran, Xinmin Tian, Hideki Saito, Diego Caballero, and Xavier Martorell. 2012. Extending OpenMP* with vector constructs for modern multicore SIMD architectures. International Workshop on OpenMP (2012).
[21]
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. 1981. Dependence graphs and compiler optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’81).
[22]
Samuel Larsen and Saman Amarasinghe. 2000. Exploiting superword level parallelism with multimedia instruction sets. Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation.
[23]
Wei Li and Keshav Pingali. 1994. A singular loop transformation framework based on non-singular matrices. Int. J. Parallel Program. 22, 2 (1994), 183--205.
[24]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05).
[25]
Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. 2011. An evaluation of vectorizing compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11).
[26]
Dorit Naishlos. 2004. Autovectorization in GCC. Proceedings of the 2004 GCC Developers Summit. Retrieved from http://people.redhat.com/lockhart/.gcc2004/MasterGCC-2side.pdf.
[27]
Dorit Naishlos, Marina Biberstein, Shay Ben-David, and Ayal Zaks. 2003. Vectorizing for a SIMdD DSP architecture. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’03).
[28]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of interleaved data for SIMD. ACM SIGPLAN Not. 41 (2006).
[29]
Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08).
[30]
Antoniu Pop and Albert Cohen. 2010. Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers. In Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC’10).
[31]
William Pugh. 1991. The omega test: A fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (SC’91).
[32]
Gil Rapaport, Ayal Zaks, and Yosi Ben-Asher. 2015. Streamlining whole function vectorization in C using higher order vector semantics. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. IEEE.
[33]
Richard M. Russell. 1978. The CRAY-1 computer system. Commun. ACM 21, 1 (1978), 63--72. 0001-0782
[34]
The LLVM Foundation. 2016. clang: a C language family frontend for LLVM. Retrieved January 27, 2016 from http://clang.llvm.org.
[35]
Xinmin Tian, Hideki Saito, Milind Girkar, Serguei V. Preis, Sergey S. Kozhukhov, Aleksei G. Cherkasov, Clark Nelson, Nikolay Panchenko, and Robert Geva. 2012. Compiling C/C++ SIMD extensions for function and loop vectorizaion on multicore-SIMD processors. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[36]
Konrad Trifunovic, Albert Cohen, David Edelsohn, Feng Li, Tobias Grosser, Harsha Jagasia, Razya Ladelsky, Sebastian Pop, Jan Sjödin, and Ramakrishna Upadrasta. 2010. GRAPHITE two years after: First lessons learned from real-world polyhedral compilation. In Proceedings of the GCC Research Opportunities Workshop (GROW).
[37]
Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09).

Cited By

View all
  • (2019)“It looks like you’re writing a parallel loop”: a machine learning based parallelization assistantProceedings of the 6th ACM SIGPLAN International Workshop on AI-Inspired and Empirical Methods for Software Engineering on Parallel Computing Systems10.1145/3358500.3361567(1-10)Online publication date: 22-Oct-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 3
September 2017
278 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3132652
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2017
Accepted: 01 May 2017
Revised: 01 April 2017
Received: 01 October 2016
Published in TACO Volume 14, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OpenMP
  2. SIMD
  3. automatic vectorization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • ARTEMIS

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)260
  • Downloads (Last 6 weeks)28
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)“It looks like you’re writing a parallel loop”: a machine learning based parallelization assistantProceedings of the 6th ACM SIGPLAN International Workshop on AI-Inspired and Empirical Methods for Software Engineering on Parallel Computing Systems10.1145/3358500.3361567(1-10)Online publication date: 22-Oct-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media