Article

Superword-Level Parallelism in the Presence of Control Flow

Authors:

Jacqueline ChameAuthors Info & Claims

CGO '05: Proceedings of the international symposium on Code generation and optimization

Pages 165 - 175

https://doi.org/10.1109/CGO.2005.33

Published: 20 March 2005 Publication History

Abstract

In this paper, we describe how to extend the concept of superword-level parallelization (SLP), used for multimedia extension architectures, so that it can be applied in the presence of control flow constructs. Superword-level parallelization involves identifying scalar instructions in a large basic block that perform the same operation, and, if dependences do not prevent it, combining them into a superword operation on a multi-word object. A key insight is that we can use techniques related to optimizations for architectures supporting predicated execution, even for multimedia ISAs that do not provide hardware predication. We derive large basic blocks with predicated instructions to which SLP can be applied. We describe how to minimize overheads for superword predicates and re-introduce control flow for scalar operations. We discuss other extensions to SLP to address common features of real multimedia codes. We present automatically-generated performance results on 8 multimedia codes to demonstrate the power of this approach. We observe speedups ranging from 1.97X to 15.07X as compared to both sequential execution and SLP alone.

References

[1]

{1} A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.

Digital Library

[2]

{2} J. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Annual Symposium on Principles of Programming Languages, pages 177-189, 1983.

Digital Library

[3]

{3} R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002.

Digital Library

[4]

{4} A. Bik, M. Girkar, P. Grey, and X. Tian. Automatic intraregister vectorization for the intel architecture. International Journal of Parallel Programming, 30(2):65-98, April 2002.

Digital Library

[5]

{5} G. Cheong and M. Lam. An optimizer for multimedia instruction sets. In The Second SUIF Compiler Workshop, Stanford University, USA, August 1997.

[6]

{6} W. Chuang, B. Calder, and J. Ferrante. Phi-predication for light-weight if-conversion. pages 179-190, San Francisco, California, 2003.

[7]

{7} D. DeVries. A vectorizing suif compiler: Implementation and performance. Master's thesis, University of Toronto, 1997.

[8]

{8} J. Draper, J. Chame, M. Hall, C. Steel, T. Barrett, J. La-Coss, J. Granacki, J. Shin, C. Chen, C. Kang, I. Kim, and G. Daglikoca. The architecture of the DIVA processing-in-memory chip. In Proceedings of the 16th ACM International Conference on Supercomputing, pages 26-37, June 2002.

Digital Library

[9]

{9} J. Draper, J. Sondeen, and C. Kang. Implementation of a 256-bit wideword processor for the data-intensive architecture (diva) processing-in-memory (pim) chip. In 28th European Solid-State Circuits Conference, Florence, Italy, September 2002.

[10]

{10} J. Ferrante and M. Mace. On linearizing parallel code. In Annual Symposium on Principles of Programming Languages, pages 179-190, 1985.

Digital Library

[11]

{11} M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing multiprocessor performance with the SUIF compiler. Computer, 29(12):84-89, 1996.

Digital Library

[12]

{12} M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In ACM International Conference on Supercomputing, November 1999.

Digital Library

[13]

{13} Intel. Intel(R) Itanium Architecture Software Developer's Manual, October 2002. 24531904.pdf.

[14]

{14} Intel. Intel(R) Itanium(R)2 Processor Reference Manual, April 2003. 25111002.pdf.

[15]

{15} A. Krall and S. Lelait. Compilation techniques for multimedia processors. International Journal of Parallel Programming , 28(4):347-361, 2000.

[16]

{16} S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In Conference on Programming Language Design and Implementation , pages 145-156, June 2000.

Digital Library

[17]

{17} S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and detecting memory address congruence. In International Conference on Parallel Architectures and Compilation Techniques , September 2002.

Digital Library

[18]

{18} C. Lee, M. Potkonjak1, and W. Mangione-Smith. Media-bench: A tool for evaluating and synthesizing multimedia and communications systems. In ACM/IEEE international symposium on Microarchitecture, pages 330-335, 1997.

Digital Library

[19]

{19} R. Lee. Subword parallelism with max2. ACM/IEEE international symposium on Microarchitecture, 16(4):51-59, August 1996.

Digital Library

[20]

{20} S. Mahlke. Exploiting Instruction-Level Parallelism in the Presence of Conditional Branches. PhD thesis, University of Illinois, Urbana IL, September 1996.

Digital Library

[21]

{21} Motorola. AltiVec Technology Programming Environments Manual, Rev. 0.1, November 1998.

[22]

{22} J. Park and M. Schlansker. On predicated execution, May 1991. Software and Systems Laboratory, HPL-91-58.

[23]

{23} J. Shin, J. Chame, and M. Hall. Compiler-controlled caching in superword register files for multimedia extension. In International Conference on Parallel Architectures and Compilation Techniques, September 2002.

Digital Library

[24]

{24} J. Smith, G. Faanes, and R. Sugumar. Vector instruction set support for conditional operations. In International Symposium on Computer Architecture, 2000.

Digital Library

[25]

{25} N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming, 2000.

Cited By

Nuzman DZaks ABen-Zion ZRodríguez GSadayappan PSukumaran-Rajam A(2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641562
Paktinatkeleshteri Rde Carvalho JAmiri EAmaral JOnut IShirani POnut IOnut IBranco P(2023)Efficient Auto-Vectorization for Control-flow Dependent Loops through Data PermutationProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615932(74-83)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.5555/3615924.3615932
Feng JHe YTao QMa H(2022)An SLP Vectorization Method Based on Equivalent Extended TransformationWireless Communications & Mobile Computing10.1155/2022/18325222022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/1832522
Show More Cited By

Index Terms

Superword-Level Parallelism in the Presence of Control Flow

Recommendations

A compiler framework for extracting superword level parallelism
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). ...
goSLP: globally optimized superword level parallelism framework

Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit superword level parallelism (SLP), a type of fine-grained parallelism. Current SLP auto-vectorization techniques ...
Exploiting superword level parallelism with multimedia instruction sets

Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '05: Proceedings of the international symposium on Code generation and optimization

March 2005

313 pages

ISBN:076952298X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 March 2005

Check for updates

Qualifiers

Article

Conference

CGO05

Sponsor:

CGO05: 3rd Annual IEEE / ACM International Symposium on Code Generation and Optimization

March 20 - 23, 2005

Acceptance Rates

CGO '05 Paper Acceptance Rate 26 of 75 submissions, 35%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

65
Total Citations
View Citations
598
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nuzman DZaks ABen-Zion ZRodríguez GSadayappan PSukumaran-Rajam A(2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641562
Paktinatkeleshteri Rde Carvalho JAmiri EAmaral JOnut IShirani POnut IOnut IBranco P(2023)Efficient Auto-Vectorization for Control-flow Dependent Loops through Data PermutationProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615932(74-83)Online publication date: 11-Sep-2023
https://dl.acm.org/doi/10.5555/3615924.3615932
Feng JHe YTao QMa H(2022)An SLP Vectorization Method Based on Equivalent Extended TransformationWireless Communications & Mobile Computing10.1155/2022/18325222022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/1832522
Liu BLaird ATsang WMahjour BDehnavi MKloeckner AMoreira J(2022)Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569663(439-450)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569663
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor F(2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3391898
Mendis CYang CPu YAmarasinghe SCarbin MWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Compiler auto-vectorization with imitation learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455597(14625-14635)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3455597
Porpodas VRocha RBrevnov EGóes LMattson TKandemir MJimborean AMoseley T(2019)Super-Node SLP: optimized vectorization for code sequences containing operators and their inverse elementsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314897(206-216)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314897
Sun HFey FZhao JGorlatch SEigenmann RDing CMcKee S(2019)WCCVProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3331059(319-329)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3331059
Liu YHong DWu JFu SHsu W(2019)Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/330148816:1(1-24)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3301488
Moll SHack S(2018)Partial control-flow linearizationACM SIGPLAN Notices10.1145/3296979.319241353:4(543-556)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192413
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents