Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/CGO.2005.33acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Superword-Level Parallelism in the Presence of Control Flow

Published: 20 March 2005 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we describe how to extend the concept of superword-level parallelization (SLP), used for multimedia extension architectures, so that it can be applied in the presence of control flow constructs. Superword-level parallelization involves identifying scalar instructions in a large basic block that perform the same operation, and, if dependences do not prevent it, combining them into a superword operation on a multi-word object. A key insight is that we can use techniques related to optimizations for architectures supporting predicated execution, even for multimedia ISAs that do not provide hardware predication. We derive large basic blocks with predicated instructions to which SLP can be applied. We describe how to minimize overheads for superword predicates and re-introduce control flow for scalar operations. We discuss other extensions to SLP to address common features of real multimedia codes. We present automatically-generated performance results on 8 multimedia codes to demonstrate the power of this approach. We observe speedups ranging from 1.97X to 15.07X as compared to both sequential execution and SLP alone.

    References

    [1]
    {1} A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.
    [2]
    {2} J. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Annual Symposium on Principles of Programming Languages, pages 177-189, 1983.
    [3]
    {3} R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002.
    [4]
    {4} A. Bik, M. Girkar, P. Grey, and X. Tian. Automatic intraregister vectorization for the intel architecture. International Journal of Parallel Programming, 30(2):65-98, April 2002.
    [5]
    {5} G. Cheong and M. Lam. An optimizer for multimedia instruction sets. In The Second SUIF Compiler Workshop, Stanford University, USA, August 1997.
    [6]
    {6} W. Chuang, B. Calder, and J. Ferrante. Phi-predication for light-weight if-conversion. pages 179-190, San Francisco, California, 2003.
    [7]
    {7} D. DeVries. A vectorizing suif compiler: Implementation and performance. Master's thesis, University of Toronto, 1997.
    [8]
    {8} J. Draper, J. Chame, M. Hall, C. Steel, T. Barrett, J. La-Coss, J. Granacki, J. Shin, C. Chen, C. Kang, I. Kim, and G. Daglikoca. The architecture of the DIVA processing-in-memory chip. In Proceedings of the 16th ACM International Conference on Supercomputing, pages 26-37, June 2002.
    [9]
    {9} J. Draper, J. Sondeen, and C. Kang. Implementation of a 256-bit wideword processor for the data-intensive architecture (diva) processing-in-memory (pim) chip. In 28th European Solid-State Circuits Conference, Florence, Italy, September 2002.
    [10]
    {10} J. Ferrante and M. Mace. On linearizing parallel code. In Annual Symposium on Principles of Programming Languages, pages 179-190, 1985.
    [11]
    {11} M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing multiprocessor performance with the SUIF compiler. Computer, 29(12):84-89, 1996.
    [12]
    {12} M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In ACM International Conference on Supercomputing, November 1999.
    [13]
    {13} Intel. Intel(R) Itanium Architecture Software Developer's Manual, October 2002. 24531904.pdf.
    [14]
    {14} Intel. Intel(R) Itanium(R)2 Processor Reference Manual, April 2003. 25111002.pdf.
    [15]
    {15} A. Krall and S. Lelait. Compilation techniques for multimedia processors. International Journal of Parallel Programming , 28(4):347-361, 2000.
    [16]
    {16} S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In Conference on Programming Language Design and Implementation , pages 145-156, June 2000.
    [17]
    {17} S. Larsen, E. Witchel, and S. Amarasinghe. Increasing and detecting memory address congruence. In International Conference on Parallel Architectures and Compilation Techniques , September 2002.
    [18]
    {18} C. Lee, M. Potkonjak1, and W. Mangione-Smith. Media-bench: A tool for evaluating and synthesizing multimedia and communications systems. In ACM/IEEE international symposium on Microarchitecture, pages 330-335, 1997.
    [19]
    {19} R. Lee. Subword parallelism with max2. ACM/IEEE international symposium on Microarchitecture, 16(4):51-59, August 1996.
    [20]
    {20} S. Mahlke. Exploiting Instruction-Level Parallelism in the Presence of Conditional Branches. PhD thesis, University of Illinois, Urbana IL, September 1996.
    [21]
    {21} Motorola. AltiVec Technology Programming Environments Manual, Rev. 0.1, November 1998.
    [22]
    {22} J. Park and M. Schlansker. On predicated execution, May 1991. Software and Systems Laboratory, HPL-91-58.
    [23]
    {23} J. Shin, J. Chame, and M. Hall. Compiler-controlled caching in superword register files for multimedia extension. In International Conference on Parallel Architectures and Compilation Techniques, September 2002.
    [24]
    {24} J. Smith, G. Faanes, and R. Sugumar. Vector instruction set support for conditional operations. In International Symposium on Computer Architecture, 2000.
    [25]
    {25} N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming, 2000.

    Cited By

    View all
    • (2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
    • (2023)Efficient Auto-Vectorization for Control-flow Dependent Loops through Data PermutationProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615932(74-83)Online publication date: 11-Sep-2023
    • (2022)An SLP Vectorization Method Based on Equivalent Extended TransformationWireless Communications & Mobile Computing10.1155/2022/18325222022Online publication date: 1-Jan-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '05: Proceedings of the international symposium on Code generation and optimization
    March 2005
    313 pages
    ISBN:076952298X

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 20 March 2005

    Check for updates

    Qualifiers

    • Article

    Conference

    CGO05

    Acceptance Rates

    CGO '05 Paper Acceptance Rate 26 of 75 submissions, 35%;
    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
    • (2023)Efficient Auto-Vectorization for Control-flow Dependent Loops through Data PermutationProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615932(74-83)Online publication date: 11-Sep-2023
    • (2022)An SLP Vectorization Method Based on Equivalent Extended TransformationWireless Communications & Mobile Computing10.1155/2022/18325222022Online publication date: 1-Jan-2022
    • (2022)Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569663(439-450)Online publication date: 8-Oct-2022
    • (2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
    • (2019)Compiler auto-vectorization with imitation learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455597(14625-14635)Online publication date: 8-Dec-2019
    • (2019)Super-Node SLP: optimized vectorization for code sequences containing operators and their inverse elementsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314897(206-216)Online publication date: 16-Feb-2019
    • (2019)WCCVProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3331059(319-329)Online publication date: 26-Jun-2019
    • (2019)Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary TranslationACM Transactions on Architecture and Code Optimization10.1145/330148816:1(1-24)Online publication date: 13-Feb-2019
    • (2018)Partial control-flow linearizationACM SIGPLAN Notices10.1145/3296979.319241353:4(543-556)Online publication date: 11-Jun-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media