Article

Free access

Exploiting superword level parallelism with multimedia instruction sets

Authors:

Saman AmarasingheAuthors Info & Claims

PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation

Pages 145 - 156

https://doi.org/10.1145/349299.349320

Published: 01 May 2000 Publication History

Abstract

Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, access to these instructions is limited to in-line assembly and library calls. Generally, it has been assumed that vector compilers provide the most promising means of exploiting multimedia instructions. Although vectorization technology is well understood, it is inherently complex and fragile. In addition, it is incapable of locating SIMD-style parallelism within a basic block.

In this paper we introduce the concept of Superword Level Parallelism (SLP),a novel way of viewing parallelism in multimedia and scientific applications. We believe SLPP is fundamentally different from the loop level parallelism exploited by traditional vector processing, and therefore demands a new method of extracting it. We have developed a simple and robust compiler for detecting SLPP that targets basic blocks rather than loop nests. As with techniques designed to extract ILP, ours is able to exploit parallelism both across loop iterations and within basic blocks. The result is an algorithm that provides excellent performance in several application domains. In our experiments, dynamic instruction counts were reduced by 46%. Speedups ranged from 1.24 to 6.70.

References

[1]

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the A CM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.

Digital Library

[2]

J. R. Allen and K. Kennedy. PFC: A Program to Convert Fortran to Parallel Form. In K. Hwang, editor, Supercomputers: Design and Applications, pages 186- 203. IEEE Computer Society Press, Silver Spring, MD, 1984.

[3]

Krste Asanovid, James Beck, Bertrand Irissou, Brian E. D. Kingsbury, Nelson Morgan, and John Wawrzynek. The TO Vector Microprocessor. In Proceedings of Hot Chips VII, August 1995.

[4]

D. Callahan and P. Havlak. Scalar expansion in PFC: Modifications for Parallelization. Supercomputer Software Newsletter 5, Dept. of Computer Science, Rice University, October 1986.

[5]

Derek J. DeVries. A Vectorizing SUIF Compiler: Implementation and Performance. Master's thesis, University of Toronto, June 1997.

[6]

Keith Diefendorff. Pentium III= Pentium II + SSE. Microprocessor Report, 13(3):1,6-11, March 1999.

[7]

Keith Diefendorff. Sony's Emotionally Charged Chip. Microprocessor Report, 13(5):1,6-11, April 1999.

[8]

Keith Diefendorff and Pradeep K. Dubey. How Multimedia Workloads Will Change Processor Design. IEEE Computer, 30(9):43-45, September 1997.

Digital Library

[9]

G. H. Barnes, R. Brown, M. Kato, D. J. Kuck, D. L. Slotnick, and R. A. Stokes. The Illiac IV Computer. IEEE Transactions on Computers, C(17):746-757, August 1968.

Digital Library

[10]

Linley Gwennap. AltiVec Vectorizes PowerPC. Microprocessor Report, 12(6):1,6-9, May 1998.

[11]

Craig Hansen. MicroUnity's MediaProcessor Architecture. IEEE Micro, 16(4):34-41, Aug 1996.

Digital Library

[12]

D.J. Kuck, R.H. Kuhn, D. Padua, B. Leasure, and M. Wolfe. Dependence Graphs and Compiler Optimizations. In Proceedings of the 8th A CM Symposium on Priciples of Programming Languages, pages 207-218, Williamsburg, VA, Jan 1981.

Digital Library

[13]

Samuel Larsen, Radu Rugina, and Saman Amarasinghe. Alignment Analysis. Technical Report LCS- TM-605, Massachusetts Institute of Technology, June 2000.

[14]

Corina G. Lee and Derek J. DeVries. Initial Results on the Performance and Cost of Vector Microprocessors. In Proceedings of the 30th Annual International Symposium on MicroArchitecutre, pages 171-182, Research Triangle Park, USA, December 1997.

Digital Library

[15]

Corina G. Lee and Mark G. Stoodley. Simple Vector Microprocessors for Multimedia Applications. In Proceedings of the 31st Annual International Symposium on MicroArchitecutre, pages 25-36, Dallas, TX, December 1998.

Digital Library

[16]

Ruby Lee. Subword Parallelism with MAX-2. IEEE Micro, 16(4):51-59, Aug 1996.

Digital Library

[17]

Glenn Luecke and Waqar Haque. Evaluation of Fortran Vector Compilers and Preprocessors. Software-- Practice and Experience, 21(9), September 1991.

Digital Library

[18]

Marc Tremblay and Michael O'Connor and Venkatesh Narayanan and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10-20, Aug 1996.

Digital Library

[19]

Motorola. AltiVec Technology Programming Environments Manual, November 1998.

[20]

Alex Peleg and Uri Weiser. MMX Technology Extension to Intel Architecture. IEEE Micro, 16(4):42-50, Aug 1996.

Digital Library

[21]

Radu Rugina and Martin Rinard. Pointer Analysis for Multithreaded Programs. In Proceedings of the SIC- PLAN '99 Conference on Programming Language Design and Implementation, Atlanta, CA, May 1999.

Digital Library

[22]

Mark Stephenson, Jonathon Babb, and Saman Amarasinghe. Bitwidth Analysis with Application to Silicon Compilation. In Proceedings of the SICPLAN '00 Conference on Programming Language Design and Implementation, Vancouver, BC, June 2000.

Digital Library

[23]

R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.- W. Tseng, M. W. Hall, M. S. Lain, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. A CM SIGPLAN Notices, 29(12):31-37, December 1994.

Digital Library

Cited By

Liu ZMada SRegehr J(2024)Minotaur: A SIMD-Oriented Synthesizing SuperoptimizerProceedings of the ACM on Programming Languages10.1145/36897668:OOPSLA2(1561-1585)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689766
Nuzman DZaks ABen-Zion ZRodríguez GSadayappan PSukumaran-Rajam A(2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641562
Zhou HHan QShi HZhang YYao JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Boost Linear Algebra Computation Performance via Efficient VNNI UtilizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651333(149-163)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651333
Show More Cited By

Index Terms

Exploiting superword level parallelism with multimedia instruction sets
1. Computer systems organization
  1. Architectures
    1. Serial architectures
  2. Embedded and cyber-physical systems
    1. Embedded systems

Recommendations

Exploiting superword level parallelism with multimedia instruction sets

Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, ...
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Application-Specific Pipelines for Exploiting Instruction-Level Parallelism

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation

August 2000

358 pages

ISBN:1581131992

DOI:10.1145/349299

Chairman:
Monica Lam

ACM SIGPLAN Notices Volume 35, Issue 5
May 2000
357 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/358438
Editor:
A. Michael Berman
Rowan Univ., Glassboro, NJ
Issue’s Table of Contents

Copyright © 2000 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2000

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PLDI00

Sponsor:

PLDI00: ACM SIGPLAN 2000 Conference on Programming Language and Design and Implementation

June 18 - 21, 2000

British Columbia, Vancouver, Canada

Acceptance Rates

PLDI '00 Paper Acceptance Rate 30 of 173 submissions, 17%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

378
Total Citations
View Citations
2,728
Total Downloads

Downloads (Last 12 months)410
Downloads (Last 6 weeks)55

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu ZMada SRegehr J(2024)Minotaur: A SIMD-Oriented Synthesizing SuperoptimizerProceedings of the ACM on Programming Languages10.1145/36897668:OOPSLA2(1561-1585)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689766
Nuzman DZaks ABen-Zion ZRodríguez GSadayappan PSukumaran-Rajam A(2024)If-Convert as Early as You MustProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641562(26-38)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641562
Zhou HHan QShi HZhang YYao JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Boost Linear Algebra Computation Performance via Efficient VNNI UtilizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651333(149-163)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651333
Thomas SBornholt JTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)Automatic Generation of Vectorizing Compilers for Customizable Digital Signal ProcessorsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624873(19-34)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624873
Tayeb HPaillat LBramas B(2023)Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph TransformationsACM Transactions on Architecture and Code Optimization10.1145/363170921:1(1-25)Online publication date: 9-Nov-2023
https://dl.acm.org/doi/10.1145/3631709
Naithani ARoelandts JAinsworth SJones TEeckhout L(2023)Decoupled Vector RunaheadProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614255(17-31)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614255
Malik RSheth KKulkarni MAamodt TJerger NSwift M(2023)Coyote: A Compiler for Vectorizing Encrypted Arithmetic CircuitsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582057(118-133)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582057
Zhang ZOu YLiu YWang CZhou YWang XZhang YOuyang YShan JWang YXue JCui HFeng XAamodt TJerger NSwift M(2023)Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU CoresProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582046(483-497)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582046
Yao JZhou HZhang YLi YFeng CChen SChen JWang YHu Q(2023)High Performance and Power Efficient Accelerator for Cloud Inference2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070941(1003-1016)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070941
Liu BLaird ATsang WMahjour BDehnavi MKloeckner AMoreira J(2022)Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-VectorizationProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569663(439-450)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569663
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents