Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1062261.1062291acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Article

Matrix register file and extended subwords: two techniques for embedded media processors

Published: 04 May 2005 Publication History

Abstract

In this paper we employ two techniques suitable for embedded media processors. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead because of mismatch between storage and computational formats. The second technique, the Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many block-based multimedia kernels such as (I)DCT, 2x2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. We employ Modified MMX (MMMX), MMX with extended subwords, to evaluate these techniques. Our results show that MMMX combined with an MRF reduces the dynamic number of instructions by up to 80% compared to other multimedia extensions such as MMX

References

[1]
M. Berekovic, H. J. Stolberg, M. B. Kulaczewski, and P. Pirsch. Instruction Set Extensions for MPEG-4 Video. Journal of VLSI Signal Processing, 23:27--49, 1999.
[2]
H. C. Chang, L. G. Chen, M. Y. Hsu, and Y. C. Chang. Performance Analysis and Architecture Evaluation of MPEG-4 Video Codec System. In IEEE Int. Symp. on Circuits and Systems, volume 2, pages 449--452, May 2000.
[3]
H. C. Chang, Y. C. Wang, M. Y. Hsu, and L. G. Chen. Efficient Algorithms and Architectures for MPEG-4 Object-Based Video Coding. In Proc. IEEE Workshop on Signal Processing Systems, 2000.
[4]
D. Cheresiz, B. Juurlink, S. Vassiliadis, and H. A. G. Wijshoff. The CSI Multimedia Architecture. IEEE Trans. on VLSI Systems, 13(1):1--13, January 2005.
[5]
J. Corbal, M. Valero, and R. Espasa. Exploiting a New Level of DLP in Multimedia Applications. In Proc. Int. Symp. on Microarchitecture, 1999.
[6]
A. Dasu and S. Panchanathan. Reconfigurable Media Processing. Parallel computing, 28(7), 2002.
[7]
K. Diefendorff, P. K. Dubey, R. H., and H. Scales. AltiVec Extension to PowerPC Accelerates Media Processing. IEEE Micro, pages 85--95, March-April 2000.
[8]
B. Hanounik and X. Hu. Linear-Time Matrix Transpose Algorithms Using Vector Register File with Diagonal Registers. In Proc. 15th Int. on Parallel and Distributed Processing, April 2001.
[9]
MIPS Technologies Inc. MIPS Extension for Digital Media with 3D. www.mips.com.
[10]
Intel. An Efficient Vector/Matrix Multiply Routine using MMX Technology. Technical report, Intel Developer Services, 2004.
[11]
Y. Jung, S. G. Berg, D. Kim, and Y. Kim. A Register File with Transposed Access Mode. In Proc. Int. Conf. on Computer Design, pages 559--560, September 2000.
[12]
B. Juurlink, A. Shahbahrami, and S. Vassiliadis. Avoiding Data Conversions in Embedded Media Processors. In Proc. 20th Annual ACM Symp. on Applied Computing, 2005. To appear.
[13]
R. B. Lee. Subword Parallelism with MAX-2. IEEE Micro, pages 51--59, August 1996.
[14]
R. B. Lee. Subword Permutation Instructions for Two-Dimensional Multimedia Processing in MicroSIMD Architectures. In Proc. of IEEE Int. Conf. on Application-Specific Systems Architectures and Processors, pages 9--23, July 2000.
[15]
C. Loeffler, A. Ligtenberg, and G. S. Moschytz. Practical Fast 1-D DCT Algorithms With 11 Multiplications. In Proc. Int. Conf. on Acoustical and Speech, volume 2, pages 988--991, 1989.
[16]
J. Oliver, V. Akella, and F. Chong. Efficient Orchestration of Sub-Word Parallelism in Media Processors. In Proc. Symp. on Parallel Algorithms and Architecture, 2004.
[17]
A. Peleg, S. Wiljie, and U. Weiser. Intel MMX for Multimedia PCs. Communications of the ACM, pages 25--38, January 1997.
[18]
S. K. Raman, V. Pentkovski, and J. Keshava. Implementing Streaming SIMD Extensions on the Pentium 3 Processor. IEEE Micro, pages 47--57, July-August 2000.
[19]
N. Slingerland and A. J. Smith. Design and Characterization of the Berkeley Multimedia Workload. Multimedia Systems, 8:315--327, 2002.
[20]
N. Slingerland and A. J. Smith. Measuring the Performance of Multimedia Instruction Sets. IEEE Trans. on Computers, 51(11):1317--1332, Nov. 2002.
[21]
S. Vassiliadis, G. Kuzmanov, and S. Wong. MPEG-4 and the New Multimedia Architectural Challenges. In Proc. 15th Int. Conf. on Systems for Automation of Engineering and Research, pages 24--32, Sep. 2001.

Cited By

View all
  • (2020)Performance Improvement of Gaussian Filter using SIMD Technology2020 International Conference on Machine Vision and Image Processing (MVIP)10.1109/MVIP49855.2020.9116883(1-6)Online publication date: Feb-2020
  • (2019)Performance Improvement of Multimedia Kernels Using Data- and Thread- Level Parallelism on CPU PlatformHigh-Performance Computing and Big Data Analysis10.1007/978-3-030-33495-6_35(459-467)Online publication date: 20-Oct-2019
  • (2018)The Case for Polymorphic Registers in Dataflow ComputingInternational Journal of Parallel Programming10.1007/s10766-017-0494-146:6(1185-1219)Online publication date: 1-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '05: Proceedings of the 2nd conference on Computing frontiers
May 2005
467 pages
ISBN:1595930191
DOI:10.1145/1062261
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 May 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. embedded media processors
  2. multimedia kernels
  3. register file
  4. sub-word parallelism

Qualifiers

  • Article

Conference

CF05
Sponsor:
CF05: Computing Frontiers Conference
May 4 - 6, 2005
Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Performance Improvement of Gaussian Filter using SIMD Technology2020 International Conference on Machine Vision and Image Processing (MVIP)10.1109/MVIP49855.2020.9116883(1-6)Online publication date: Feb-2020
  • (2019)Performance Improvement of Multimedia Kernels Using Data- and Thread- Level Parallelism on CPU PlatformHigh-Performance Computing and Big Data Analysis10.1007/978-3-030-33495-6_35(459-467)Online publication date: 20-Oct-2019
  • (2018)The Case for Polymorphic Registers in Dataflow ComputingInternational Journal of Parallel Programming10.1007/s10766-017-0494-146:6(1185-1219)Online publication date: 1-Dec-2018
  • (2015)Restructuring and implementations of 2D matrix transpose algorithm using SSE4 vector instructions2015 International Conference on Applied Research in Computer Science and Engineering (ICAR)10.1109/ARCSE.2015.7338144(1-7)Online publication date: Oct-2015
  • (2013)Separable 2d convolution with polymorphic register filesProceedings of the 26th international conference on Architecture of Computing Systems10.1007/978-3-642-36424-2_27(317-328)Online publication date: 19-Feb-2013
  • (2012)Vector Register Design with Register Bypassing for Embedded DSP Core2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems10.1109/HPCC.2012.151(1033-1038)Online publication date: Jun-2012
  • (2011)Scalability evaluation of a polymorphic register fileProceedings of the 24th international conference on Architecture of computing systems10.5555/1966221.1966224(13-25)Online publication date: 24-Feb-2011
  • (2011)StVECProceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2011.59(276-287)Online publication date: 10-Oct-2011
  • (2011)Scalability Evaluation of a Polymorphic Register File: A CG Case StudyArchitecture of Computing Systems - ARCS 201110.1007/978-3-642-19137-4_2(13-25)Online publication date: 2011
  • (2010)Permutation optimization for SIMD devicesProceedings of 2010 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2010.5537700(3849-3852)Online publication date: May-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media