Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/349299.349320acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

Exploiting superword level parallelism with multimedia instruction sets

Published: 01 May 2000 Publication History

Abstract

Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, access to these instructions is limited to in-line assembly and library calls. Generally, it has been assumed that vector compilers provide the most promising means of exploiting multimedia instructions. Although vectorization technology is well understood, it is inherently complex and fragile. In addition, it is incapable of locating SIMD-style parallelism within a basic block.
In this paper we introduce the concept of Superword Level Parallelism (SLP),a novel way of viewing parallelism in multimedia and scientific applications. We believe SLPP is fundamentally different from the loop level parallelism exploited by traditional vector processing, and therefore demands a new method of extracting it. We have developed a simple and robust compiler for detecting SLPP that targets basic blocks rather than loop nests. As with techniques designed to extract ILP, ours is able to exploit parallelism both across loop iterations and within basic blocks. The result is an algorithm that provides excellent performance in several application domains. In our experiments, dynamic instruction counts were reduced by 46%. Speedups ranged from 1.24 to 6.70.

References

[1]
E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the A CM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.
[2]
J. R. Allen and K. Kennedy. PFC: A Program to Convert Fortran to Parallel Form. In K. Hwang, editor, Supercomputers: Design and Applications, pages 186- 203. IEEE Computer Society Press, Silver Spring, MD, 1984.
[3]
Krste Asanovid, James Beck, Bertrand Irissou, Brian E. D. Kingsbury, Nelson Morgan, and John Wawrzynek. The TO Vector Microprocessor. In Proceedings of Hot Chips VII, August 1995.
[4]
D. Callahan and P. Havlak. Scalar expansion in PFC: Modifications for Parallelization. Supercomputer Software Newsletter 5, Dept. of Computer Science, Rice University, October 1986.
[5]
Derek J. DeVries. A Vectorizing SUIF Compiler: Implementation and Performance. Master's thesis, University of Toronto, June 1997.
[6]
Keith Diefendorff. Pentium III= Pentium II + SSE. Microprocessor Report, 13(3):1,6-11, March 1999.
[7]
Keith Diefendorff. Sony's Emotionally Charged Chip. Microprocessor Report, 13(5):1,6-11, April 1999.
[8]
Keith Diefendorff and Pradeep K. Dubey. How Multimedia Workloads Will Change Processor Design. IEEE Computer, 30(9):43-45, September 1997.
[9]
G. H. Barnes, R. Brown, M. Kato, D. J. Kuck, D. L. Slotnick, and R. A. Stokes. The Illiac IV Computer. IEEE Transactions on Computers, C(17):746-757, August 1968.
[10]
Linley Gwennap. AltiVec Vectorizes PowerPC. Microprocessor Report, 12(6):1,6-9, May 1998.
[11]
Craig Hansen. MicroUnity's MediaProcessor Architecture. IEEE Micro, 16(4):34-41, Aug 1996.
[12]
D.J. Kuck, R.H. Kuhn, D. Padua, B. Leasure, and M. Wolfe. Dependence Graphs and Compiler Optimizations. In Proceedings of the 8th A CM Symposium on Priciples of Programming Languages, pages 207-218, Williamsburg, VA, Jan 1981.
[13]
Samuel Larsen, Radu Rugina, and Saman Amarasinghe. Alignment Analysis. Technical Report LCS- TM-605, Massachusetts Institute of Technology, June 2000.
[14]
Corina G. Lee and Derek J. DeVries. Initial Results on the Performance and Cost of Vector Microprocessors. In Proceedings of the 30th Annual International Symposium on MicroArchitecutre, pages 171-182, Research Triangle Park, USA, December 1997.
[15]
Corina G. Lee and Mark G. Stoodley. Simple Vector Microprocessors for Multimedia Applications. In Proceedings of the 31st Annual International Symposium on MicroArchitecutre, pages 25-36, Dallas, TX, December 1998.
[16]
Ruby Lee. Subword Parallelism with MAX-2. IEEE Micro, 16(4):51-59, Aug 1996.
[17]
Glenn Luecke and Waqar Haque. Evaluation of Fortran Vector Compilers and Preprocessors. Software-- Practice and Experience, 21(9), September 1991.
[18]
Marc Tremblay and Michael O'Connor and Venkatesh Narayanan and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10-20, Aug 1996.
[19]
Motorola. AltiVec Technology Programming Environments Manual, November 1998.
[20]
Alex Peleg and Uri Weiser. MMX Technology Extension to Intel Architecture. IEEE Micro, 16(4):42-50, Aug 1996.
[21]
Radu Rugina and Martin Rinard. Pointer Analysis for Multithreaded Programs. In Proceedings of the SIC- PLAN '99 Conference on Programming Language Design and Implementation, Atlanta, CA, May 1999.
[22]
Mark Stephenson, Jonathon Babb, and Saman Amarasinghe. Bitwidth Analysis with Application to Silicon Compilation. In Proceedings of the SICPLAN '00 Conference on Programming Language Design and Implementation, Vancouver, BC, June 2000.
[23]
R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.- W. Tseng, M. W. Hall, M. S. Lain, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. A CM SIGPLAN Notices, 29(12):31-37, December 1994.

Cited By

View all
  • (2024)Minotaur: A SIMD-Oriented Synthesizing SuperoptimizerProceedings of the ACM on Programming Languages10.1145/36897668:OOPSLA2(1561-1585)Online publication date: 8-Oct-2024
  • (2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
  • (2024)Boost Linear Algebra Computation Performance via Efficient VNNI UtilizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651333(149-163)Online publication date: 27-Apr-2024
  • Show More Cited By

Index Terms

  1. Exploiting superword level parallelism with multimedia instruction sets

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
      August 2000
      358 pages
      ISBN:1581131992
      DOI:10.1145/349299
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 May 2000

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      PLDI00

      Acceptance Rates

      PLDI '00 Paper Acceptance Rate 30 of 173 submissions, 17%;
      Overall Acceptance Rate 406 of 2,067 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)410
      • Downloads (Last 6 weeks)55
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Minotaur: A SIMD-Oriented Synthesizing SuperoptimizerProceedings of the ACM on Programming Languages10.1145/36897668:OOPSLA2(1561-1585)Online publication date: 8-Oct-2024
      • (2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
      • (2024)Boost Linear Algebra Computation Performance via Efficient VNNI UtilizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651333(149-163)Online publication date: 27-Apr-2024
      • (2024)Automatic Generation of Vectorizing Compilers for Customizable Digital Signal ProcessorsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624873(19-34)Online publication date: 27-Apr-2024
      • (2023)Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph TransformationsACM Transactions on Architecture and Code Optimization10.1145/363170921:1(1-25)Online publication date: 9-Nov-2023
      • (2023)Decoupled Vector RunaheadProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614255(17-31)Online publication date: 28-Oct-2023
      • (2023)Coyote: A Compiler for Vectorizing Encrypted Arithmetic CircuitsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582057(118-133)Online publication date: 25-Mar-2023
      • (2023)Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU CoresProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582046(483-497)Online publication date: 25-Mar-2023
      • (2023)High Performance and Power Efficient Accelerator for Cloud Inference2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070941(1003-1016)Online publication date: Feb-2023
      • (2022)Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569663(439-450)Online publication date: 8-Oct-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media