Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1366230.1366266acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Compiling for an indirect vector register architecture

Published: 05 May 2008 Publication History

Abstract

The iVMX architecture contains a novel vector register file of up to 4096 vector registers accessed indirectly via a mapping mechanism, providing compatibility with the VMX architecture, and potential for dramatic performance benefits [7]. The large number of vector registers and the unique indirection mechanism pose compilation challenges to be used efficiently: the indirection mechanism emphasizes spatial locality of registers and interaction among destination and source operands during register allocation, and the many vector registers call for aggressive automatic vectorization.
This work is a first step in addressing the compilability of iVMX, following the presentation and validation of its architectural aspects [7]. In this paper we present several compilation approaches to deal with the mapping mechanism and an outer-loop vectorization transformation developed to promote the use of many vector registers. We modified an existing register allocator to target all available registers and added a post-pass to rename live-ranges considering spatial locality and interaction among operand types. An FIR filter is used to demonstrate the effectiveness of the techniques developed compared to a version hand-optimized for iVMX. Initial results show that we can reduce the overhead of map management down to 29% of the total instruction count, compared to 22% obtained manually, and compared to 49% obtained using a naive scheme, while outperforming an equivalent VMX implementation by a factor of 2.

References

[1]
R. Allen and K. Kennedy. Automatic translation of fortran programs to vector form. ACM Trans. on Programming Languages and Systems, 9(4):491--542, October 1987.
[2]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2001.
[3]
Aart Bik. The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
[4]
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In PLDI, pages 53--65, June 1990.
[5]
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In PLDI, pages 53--65, June 1990.
[6]
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In PLDI, pages 53--65, June 1990.
[7]
J. H. Derby, R. K. Montoye, and J. Moreira. Victoria - a vmx indirect compute technology oriented towards in-line acceleration. In Computing Frontiers, May 2006.
[8]
J. H. Derby and J. H. Moreno. A high-performance embedded dsp core with novel simd features. In ICASSP, 2003.
[9]
Free Software Foundation. gcc.gnu.org/projects/tree-ssa/vectorization.html.
[10]
Freescale Semiconductor, http://www.freescale.com. Altivec real fir, October 2002.
[11]
H. C. Hunter and J. H. Moreno. A new look at exploiting data parallelism in embedded systems. In CASES, pages 159--169, October 2003.
[12]
S. Kim and S. Moon. Rotating register allocation for enhanced pipeline scheduling. In PACT, 2007.
[13]
J. H. Moreno, V. Zyuban, U. Shvadron, F. Neeser, J. Derby, M. Ware, K. Kailas, A. Zaks, A. Geva, S. Ben-David, S. Asaad, T. Fox, M. Biberstein, D. Naishlos, and H. Hunter. An innovative low-power high-performance programmable signal processor for digital communications. IBM J. of R&D, March 2003.
[14]
S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997.
[15]
D. Naishlos. Autovectorization in gcc. In GCC Developer?s summit, pages 105--118, June 2004.
[16]
D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a simdd dsp architecture. In CASES, pages 2--11, October 2003.
[17]
M. Namolaru. Register allocation techniques for ivmx architecture. In Int?l Workshop on GCC for Research in Embedded and Parallel Systems, September 2007.
[18]
D. Nuzman and A. Zaks. Autovectorization in gcc - two years later. In GCC Developer?s summit, June 2006.
[19]
P. R. Panda, N. D. Dutt, and A. Nicolau. Efficient utilization of scratch-pad memory in embedded processor applications. In European Design and Test Conf., March 1997.
[20]
M. Postiff. Compiler and Microarchitecture Mechanisms for Exploiting Registers to Improve Memory Performance. PhD thesis, U. of Michigan, 2001.
[21]
R. G. Scarborough and H. G. Kolsky. A vectorizing fortran compiler. IBM J. of R&D, 30(2):163--171, March 1986.
[22]
J. Shin, J. Chame, and M. W. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. In PACT, September 2002.
[23]
N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. Intl? J. of Parallel Programming, 28(4):363--400, August 2000.
[24]
C. Tenllado, L. Piñuel, M. Prieto, and F. Catthoor. Pack transposition: Enhancing superword level parallelism exploitation. In Parallel Computing, 2005.
[25]
C. Tenllado, L. Piñuel, M. Prieto, F. Tirado, and F. Catthoor. Improving superword level parallelism support in modern compilers. In CODES+ISSS, 2005.
[26]
M. Wolfe. High Performance Compilers for Parallel Computing. Addison Wesley, 1996.
[27]
P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An integrated simdization framework using virtual vectors. In ICS, June 2005.

Cited By

View all
  • (2012)Storage Allocation for Streaming-Based Register FileEnergy-Aware Memory Management for Embedded Multimedia Systems10.1201/b11418-6(151-194)Online publication date: 4-Jan-2012
  • (2012)Automatic efficient data layout for multithreaded stencil codes on CPU sand GPUs2012 19th International Conference on High Performance Computing10.1109/HiPC.2012.6507504(1-10)Online publication date: Dec-2012
  • (2010)ACOTES Project: Advanced Compiler Technologies for Embedded StreamingInternational Journal of Parallel Programming10.1007/s10766-010-0132-739:3(397-450)Online publication date: 20-Apr-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '08: Proceedings of the 5th conference on Computing frontiers
May 2008
334 pages
ISBN:9781605580777
DOI:10.1145/1366230
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compiler controlled cache
  2. data reuse
  3. rotating register file
  4. simd
  5. subword parallelism
  6. vectorization
  7. viterbi

Qualifiers

  • Research-article

Conference

CF '08
Sponsor:
CF '08: Computing Frontiers Conference
May 5 - 7, 2008
Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Storage Allocation for Streaming-Based Register FileEnergy-Aware Memory Management for Embedded Multimedia Systems10.1201/b11418-6(151-194)Online publication date: 4-Jan-2012
  • (2012)Automatic efficient data layout for multithreaded stencil codes on CPU sand GPUs2012 19th International Conference on High Performance Computing10.1109/HiPC.2012.6507504(1-10)Online publication date: Dec-2012
  • (2010)ACOTES Project: Advanced Compiler Technologies for Embedded StreamingInternational Journal of Parallel Programming10.1007/s10766-010-0132-739:3(397-450)Online publication date: 20-Apr-2010
  • (2009)SARAProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629442(41-50)Online publication date: 11-Oct-2009
  • (2008)Outer-loop vectorizationProceedings of the 17th international conference on Parallel architectures and compilation techniques10.1145/1454115.1454119(2-11)Online publication date: 25-Oct-2008

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media