Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/IPDPS.2005.94guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions

Published: 04 April 2005 Publication History

Abstract

Multimedia extensions (MME) are architectural extensions to general-purpose processors to boost the performance of multimedia workloads. Today, in-line assembly code, intrinsic functions and library routines are the most common means to program these extensions. A promising alternative is to exploit vectorization technology to automatically generate MME instructions from programs written in standard high-level languages. However, despite the early success of automatic vectorization for traditional vector supercomputers, state-of-the-art vectorizing compilers for multimedia extensions have yet to demonstrate their effectiveness, especially on multimedia workloads. In this paper, we conducted an empirical study on the vectorization of media processing programs for multimedia extensions. Our study identified several new issues that are not handled by traditional vectorizers. These issues arise partly as the result of the unique features of MME architectures, partly due to the characteristics of media processing applications. We proposed several techniques to address some of these issues. We further assessed the effectiveness of our techniques by manually applying them to a set of multimedia programs. In addition, we found that further optimizations after vectorization are essential to benefit from multimedia extensions. In our experiments, 23 of 34 core procedures from the Berkeley Media Benchmark (BMW) were manually vectorized and 14 procedures achieved speedups of 1.10 to 3.39 on a Pentium 4 processor.

References

[1]
Randy Allen and Ken Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491-542, 1987.
[2]
Aart J. C. Bik. The Software Vectorization Handbook : Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
[3]
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian. Automatic detection of saturation and clipping idioms. In Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computers, 2002.
[4]
Aart J.C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian. Automatic intra-register vectorization for the intel architecture. International Journal of Parallel Programming, 30(2):65-98, 2002.
[5]
Gerald Cheong and Monica Lam. An optimizer for multimedia instruction sets. In Proceedings of the Second SUIF Compiler Workshop, 1997.
[6]
CodePlay. VectorC PC Overview. http://www.codeplay.com/vectorc/index_pc.html.
[7]
Intel Corporation. IA32 Intel Architecture Optimization.
[8]
Intel Corporation. IA32 Intel Architecture Software Developer's Manual (Volume 1: Basic Architecture).
[9]
Alexandre Eichenberger, PengWu, and Kevin O'Brien. Vectoization for short simd architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, 2004.
[10]
Randall J. Fisher and Henry G. Dietz. Compiling for simd within a register. In Processings of 11th International Workshop on Languages and Compilers for Parallel Processing, pages 290-304, 1998.
[11]
Franz Franchetti and Markus Puschel. A simd vectorizing compiler for digital signal processing algorithms. In Proc. International Parallel and Distributed Processing Symposium (IPDPS) 2002, 2002.
[12]
Franz Franchetti, Yevgen Voronenko, and Markus Puschel. Loop merging for signal transforms. In To appear in Proc. PLDI 2005, 2005.
[13]
Bjorn Franke and Michael O'boyle. Array recovery and high-level transformations for dsp applications. Trans. on Embedded Computing Sys., 2(2):132-162, 2003.
[14]
Sam Fuller. Motorola's altivec technology. 1998.
[15]
Andreas Krall and Sylvain Lelait. Compilation techniques for multimedia processors. International Journal of Parallel Programming, 28(4):347-361, 2000.
[16]
Richard E. Ladner and Michael J. Fischer. Parallel prefix computation. Journal of the ACM, 27:831-838, 1980.
[17]
Samuel Larsen and Saman Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN 2000 conference on Programming Language Design and Implementation, pages 145-156. ACM Press, 2000.
[18]
Samuel Larsen, Emmett Witchel, and Saman P. Amarasinghe. Increasing and detecting memory address congruence. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 18-29. IEEE Computer Society, 2002.
[19]
Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In International Symposium on Microarchitecture, pages 330-335, 1997.
[20]
David Levine, David Callahan, and Jack Dongarra. A comparative study of automatic vectorizing compilers. Parallel Computing, 17(10-11):1223-1244, 1991.
[21]
Philip J. Mucci, Kevin London, and Dan Terpstra. PAPI Programmer's Reference v2.3.
[22]
nVIDIA. nVIDIA GeForce FX. http://www.nvidia.com/.
[23]
International Standard Organization. Programming Languages - C, ISO/IEC 9899, 1999.
[24]
Jaewook Shin, Mary W. Hall, and Jacqueline Chame. Superword-level parallelism in the presence of control flow. In CGO 05, 2005.
[25]
Nathan T. Slingerland and Alan Jay Smith. Design and characterization of the berkeley multimedia workload. Multimedia Syst., 8(4):315-327, 2002.
[26]
Crescent Bay Software. VAST-C/AltiVec: Automatic C Vectorizer for Motorola AltiVec.
[27]
Manu Sporny, Gray Carper, and Jonathan Turner. The Playstation 2 Linux Kit Handbook, 2002.
[28]
N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming, 28(4):363-300, 2000.
[29]
The Portland Group Compiler Technology. PGI Users Guide : Parallel Fortran, C and C++ for Scientists and Engineers, 2004.
[30]
Peng Wu, Albert Cohen, Jay Hoeflinger, and David Padua. Monotonic evolution: An alternative to induction variable substitution for dependence analysis. In Proceedings of the 15th International Conference on Supercomputing, pages 78-91. ACM Press, 2001.
[31]
Peng Wu, Alexandre Eichenberger, and Amy Wang. Efficient simd code generation for runtime alignment and length conversion. In CGO 05, 2005.

Cited By

View all
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • (2023)Enhancing LLVM Optimizations for Linear Recurrence Programs on RVVProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605904(79-87)Online publication date: 7-Aug-2023
  • (2015)Evaluating vector data type usage in OpenCL kernelsConcurrency and Computation: Practice & Experience10.1002/cpe.342427:17(4586-4602)Online publication date: 10-Dec-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
April 2005
ISBN:0769523129

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 April 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • (2023)Enhancing LLVM Optimizations for Linear Recurrence Programs on RVVProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605904(79-87)Online publication date: 7-Aug-2023
  • (2015)Evaluating vector data type usage in OpenCL kernelsConcurrency and Computation: Practice & Experience10.1002/cpe.342427:17(4586-4602)Online publication date: 10-Dec-2015
  • (2012)Portable Parallel Programs using architecture-aware librariesProceedings of the 27th Annual ACM Symposium on Applied Computing10.1145/2245276.2232093(1922-1924)Online publication date: 26-Mar-2012
  • (2009)Optimizing techniques for saturated arithmetic with first-order linear recurrenceProceedings of the 2009 ACM symposium on Applied Computing10.1145/1529282.1529704(1883-1889)Online publication date: 8-Mar-2009
  • (2009)On the exploitation of loop-level parallelism in embedded applicationsACM Transactions on Embedded Computing Systems10.1145/1457255.14572578:2(1-34)Online publication date: 9-Feb-2009
  • (2009)Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory UnitJournal of Signal Processing Systems10.1007/s11265-008-0229-z56:2-3(249-260)Online publication date: 1-Sep-2009
  • (2009)Mapping streaming languages to general purpose processors through vectorizationProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_7(95-110)Online publication date: 8-Oct-2009
  • (2007)Instruction selection for subword level parallelism optimizations for application specific instruction processorsProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2396061(946-957)Online publication date: 29-Aug-2007
  • (2006)Challenges in exploitation of loop parallelism in embedded applicationsProceedings of the 4th international conference on Hardware/software codesign and system synthesis10.1145/1176254.1176298(173-180)Online publication date: 22-Oct-2006
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media