Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1007912.1007946acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Efficient orchestration of sub-word parallelism in media processors

Published: 27 June 2004 Publication History

Abstract

Communication and multimedia applications with increased data rates and enhanced functionality continuously raise the bar for the computational requirements of future microprocessors. In order to meet these computational demands it is necessary to exploit sub-word parallelism efficiently. We propose to make sub-word data movement a first-class operation in microprocessor architectures by introducing a Sub-word Permutation Unit (SPU)in the execution pipeline. The SPU is evaluated in the context of the MMX media co-processor for the Intel Pentium architectures, but our results can be extended to any processor that supports sub-word parallelism. We find that the SPU all ws us to orchestrate sub-word data placement prior to computation, thus all wing the MMX functional units to concentrate on performing calculations. Furthermore, we introduce a decoupled SPU control mechanism at the basic block level which allows static optimization to eliminate data-movement verhead in tight loops, where most media and signal processing occurs. We demonstrated that anywhere from 4% to 20% improvement can be obtained on key media and signal processing kernels with as little as 1% increase in hardware resources.

References

[1]
Virtual press kit: Intel Pentium 4 processor. http://www.intel.com/pressroom/archive/photos/p4_photos.htm.
[2]
K. Diefendorff and P. Dubey. How multimedia workloads will change rocessor design. IEEE Computer,30(9):43--45, sept 1997.
[3]
S. Dutta, K. Connor, W. Wolf, and A. Wolfe. A Design Study of a 0.25um Video Signal Processor. IEEE Transactions on Circuits and Systems for Vide Technology, 8:501--519, august 1998.
[4]
J. Fridman. Subword parallelism in digital signal processing. IEEE Signal Processing Magazine, 17(2):270--35, march 2000.
[5]
J. Fridman and Z. Greenfield. The TigerSHARC DSP Architecture. IEEE Micro pages 66--76, 2000.
[6]
S. R. Gerrit Slavenburg and H. Dijkstra. The TriMedia TM-1 PCI VLIW Media Processor. In Proceedings of the HotChips 8: A Symposium on High Performance Chips, august 1996.
[7]
J. L. Hennessy and D. A.Patterson. Computer Architecture: A Quantitative Approach, 2002.
[8]
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, 2002. Figure 2.37, page 142, Third Edition.
[9]
Intel. Vtune performance analyzers. http://www.intel.com/software/prodcuts/vtune/.
[10]
IPP Intel. Intel Integrated Performance Primitives for Intel Pentium Processors and Intel Itanium Architectures. http://www.intel.com/software/rodcuts/ip/ip30/.
[11]
S. L. Johnsson and C.-T. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249--1268, September 1989.
[12]
P. D. Keith Diefendorff, R. Hochsprung, and H. Scales. Altivec extension to powerpc accelerates media processing. IEEE Micro, pages 85--96, march 2000.
[13]
D.J. Kuck and R. A. Stokes. The Burroughs Scientific Processor (BSP). IEEE Transaction on Computers, 31:363--376, may 1982.
[14]
R. B. Lee. Subword parallelism with MAX-2 --accelerating media rocessing with a minimal set of instruction extensions supporting efficient subword parallelism. IEEE Micro, 16(4):51--59, 1996.
[15]
R. B. Lee. Multimedia extensions for general-purpose processors. In IEEE Workshop on Signal Processing Systems, pages 9--23, november 1997.
[16]
P. Mattson, W. Dally, S. Rixner, and J. Owens. Communication Scheduling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, november 2000.
[17]
S. A. McKee, A. Aluwihare, B. H. Clark, R. H. Klenke, T. C. Landon, C. W. Oliver, M. H. Salinas, A. E. Szymkowiak, K. L. Wright, W. A. Wulf, and J. H. Aylor. Design andevaluation of dynamic access ordering hardware. In International Conference on Supercomputing, pages 125--132, 1996.
[18]
Klenke, T.C. Landon, C.W. Oliver, M.H. Salinas, A.E. Szymkowiak, K.L. Wright, W.A. Wulf, and J.H. Aylor. Design and evaluation of dynamic access ordering hardware. In International Conference on Supercomputing, pages 125--132, 1996.
[19]
D. O. Michael Kagan, Simcha Gochman and D. Lin. MMX microarchitecture of Pentium rocessors with MMX technology and Pentium II microprocessors. (Q3):8, 1997.
[20]
A. Peleg and U. Weiser. MMX technology extension to Intel architecture. IEEE Micro, 16(4):42--50, 1996.
[21]
N. Seshan. High VelociTI Processing. IEEE Signal Processing Magazine, pages 86--101, march 1998.
[22]
D. Talla. Architectural techniques to accelerate multimedia applications on general-purpose processors, 2001.
[23]
M. Taylor, W. Lee, S. Amarsinghe, and A. Agarwal. Scalar operand network: On-chip interconnect for ilp in partitioned architectures. In HPCA, february 2003.
[24]
A. Wolfe, J. Fritts, S. Dutta, and E. Fernandes. Datapath Design for a VLIW Signal Processor. In Proceedings of HPCA-3, 1997, february 1997.
[25]
W. Wulf. Compilers and Computer Architecture. IEEE Computers, pages 41--48, July 1981.

Cited By

View all
  • (2018)Avoiding conversion and rearrangement overhead in SIMD architecturesInternational Journal of Parallel Programming10.1007/s10766-006-0015-034:3(237-260)Online publication date: 27-Dec-2018
  • (2017)Hardware/Software Approach to Designing Low-Power RNS-Enhanced Arithmetic UnitsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2017.266910864:5(1031-1039)Online publication date: May-2017
  • (2005)Matrix register file and extended subwordsProceedings of the 2nd conference on Computing frontiers10.1145/1062261.1062291(171-179)Online publication date: 4-May-2005

Index Terms

  1. Efficient orchestration of sub-word parallelism in media processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
    June 2004
    332 pages
    ISBN:1581138407
    DOI:10.1145/1007912
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. decoupled control
    2. media processors
    3. parallelism
    4. sub word

    Qualifiers

    • Article

    Conference

    SPAA04

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Avoiding conversion and rearrangement overhead in SIMD architecturesInternational Journal of Parallel Programming10.1007/s10766-006-0015-034:3(237-260)Online publication date: 27-Dec-2018
    • (2017)Hardware/Software Approach to Designing Low-Power RNS-Enhanced Arithmetic UnitsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2017.266910864:5(1031-1039)Online publication date: May-2017
    • (2005)Matrix register file and extended subwordsProceedings of the 2nd conference on Computing frontiers10.1145/1062261.1062291(171-179)Online publication date: 4-May-2005

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media