chapter

A multithreaded multicore system for embedded media processing

Authors:

Jan Hoogerbrugge,

Andrei TerechkoAuthors Info & Claims

Transactions on high-performance embedded architectures and compilers III

January 2011

Pages 154 - 173

Published: 01 January 2011 Publication History

Abstract

We describe a multicore system targeting media processing applications where the cores are multithreaded. The multithreaded cores use a new type of multithreading that we call Subset Static Interleaved (SSI) multithreading. SSI multithreading combines the advantages of blocked multithreading and a simple form of interleaved multithreading called static interleaved multithreading. SSI multithreading divides threads into foreground and background threads and performs static interleaving among the foreground threads. A foreground thread is swapped with a runnable background thread whenever the foreground thread is stalled. SSI multithreading achieves reduced operation latencies, memory latency tolerance, fast context switching, and compared to traditional dynamic interleaving, a relatively low design complexity of the register file.

We use a task scheduling unit (TSU) to dispatch tasks to the cores. The TSU is aware of the fact that the cores are multithreaded. This makes a more efficient mapping of tasks to cores possible by scheduling tasks on the least loaded cores.

We evaluate the system on an optimized Super HD H.264 decoder where the macroblock decoding and deblocking has been parallelized. The complexity of the H.264 standard and the high resolution makes this a challenging and performance demanding application. We achieve speedups of up to 17.7 times for 16 cores with four threads per core relative to a single-threaded single core. Furthermore, the proposed SSI multithreading achieves a speedup of 1.52 times relative to no multithreading, while blocked multithreading achieves only 1.38 times and a restricted form of interleaved multithreading achieves only 1.37 times speedup.

References

[1]

van de Waerdt, J.W., Vassiliadis, S., Das, S., Mirolo, S., Yen, C., Zhong, B., Basto, C., van Itegem, J.P., Amirtharaj, D., Kalra, K., Rodriguez, P., van Antwerpen, H.: The TM3270 Media-Processor. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 331-342. IEEE Computer Society, Los Alamitos (2005).

Digital Library

[2]

Ungerer, T., Robič, B., Šilc, J.: A Survey of Processors with Explicit Multithreading. ACM Comput. Surv. 35(1), 29-63 (2003).

Digital Library

[3]

Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous Multithreading: Maximizing On-chip Parallelism. In: ISCA 1995: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392-403. ACM Press, New York (1995).

Digital Library

[4]

Keckler, S.W., Dally, W.J.: Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism. In: ISCA 1992: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 202-213. ACM Press, New York (1992).

Digital Library

[5]

Özer, E., Conte, T.M., Sharma, S.: Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds.) HiPC 2001. LNCS, vol. 2228, pp. 1520-6149. Springer, Heidelberg (2001).

[6]

Jouppi, N.P., Wall, D.W.: Available Instruction-level Parallelism for Superscalar and Superpipelined Machines. In: ASPLOS-III: Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 272-282. ACM Press, New York (1989).

Digital Library

[7]

Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: Architectural Support for Fine-grained Parallelism on Chip Multiprocessors. In: ISCA 2007: Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 162-173. ACM Press, New York (2007).

Digital Library

[8]

Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An Efficient Multithreaded Runtime System. In: PPOPP 1995: Proceedings of the fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 207-216. ACM Press, New York (1995).

Digital Library

[9]

Wiegand, T., Sullivan, G.J., Bjntegaard, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. Circuits Syst. Video Techn. 13(7), 560-576 (2003).

Digital Library

[10]

Richardson, I.E.: H.264 and MPEG-4 Video Compresson. John Wiley and Sons, Chichester (2003).

[11]

Sci-Worx: MSVD-HD, Multi-Standard High Definition Video Decoder (2006), www.sci-worx.com

[12]

Chen, J.W., Lin, Y.L.: A High-Performance Hardwired CABAC Decoder. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Santa Clara, California, United States, pp. 1520-6149 (2007).

[13]

van der Tol, E.B., Jaspers, E.G., Gelderblom, R.H.: Mapping of H.264 Decoding on a Multiprocessor Architecture. In: Image and Video Communications and Processing, Santa Clara, California, United States, pp. 707-718 (2003).

[14]

van de Waerdt, J.W., Vassiliadis, S., van Itegem, J.P., van Antwerpen, H.: The TM3270 Media-Processor Data Cache. In: Proceedings of the IEEE International Conference on Computer Design, pp. 334-341 (2005).

Digital Library

[15]

Borkenhagen, J., Eickemeyer, R., Kala, R., Kunkel, S.: A Multithreaded PowerPC Processor for Commercial Servers. IBM Journal of Research Development 44(6), 885-898 (2000).

Digital Library

[16]

Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: ISCA 1995: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 24- 36. ACM Press, New York (1995).

Digital Library

[17]

Zuberek, W.M.: Performance Analysis of Enhanced Fine-Grain Multithreaded Distributed-Memory Systems. In: Proc. IEEE Conference on Systems, Man, and Cybernetics, Tucson, Arizona, United States, pp. 1101-1106 (2001).

[18]

Tune, E., Kumar, R., Tullsen, D.M., Calder, B.: Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy. In: MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 183-194. IEEE Computer Society, Los Alamitos (2004).

Digital Library

[19]

Schulte, M., Glossner, J., Jinturkar, S., Moudgill, M., Mamidi, S., Vassiliadis, S.: A Low-Power Multithreaded Processor for Software Defined Radio. J. VLSI Signal Process. Syst. 43(2-3), 143-159 (2006).

Digital Library

[20]

Hansen, C.: Micro Unity's Media Processor Architecture. IEEE Micro 16(4), 34-41 (1996).

Digital Library

[21]

Ramadurai, V., Jinturkar, S., Moudgill, M., Glossner, J.: Multithreading H.264 Decoder on Sandblaster DSP. In: Proceedings at the 2005 Global Signal Processing Expo (GSPx) and International Signal Processing Conference (ISPC), Santa Clara, California (2005).

[22]

Bilas, A., Fritts, J., Singh, J.P.: Real-Time Parallel MPEG-2 Decoding in Software. In: IPPS 1997: Proceedings of the 11th International Symposium on Parallel Processing, Washington, DC, USA, pp. 197-203. IEEE Computer Society, Los Alamitos (1997).

Digital Library

Cited By

Wang YLi KLi K(2017)Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops ApplicationsInternational Journal of Parallel Programming10.1007/s10766-016-0445-245:4(827-852)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s10766-016-0445-2
Belviranli MChou CBhuyan LGupta RYildirim EBalman MYildirim E(2014)A paradigm shift in GP-GPU computingProceedings of the sixth international workshop on Data intensive distributed computing10.1145/2608020.2608024(29-34)Online publication date: 23-Jun-2014
https://dl.acm.org/doi/10.1145/2608020.2608024

Recommendations

A Multithreaded Multicore System for Embedded Media Processing
Proceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 6590

We describe a multicore system targeting media processing applications where the cores are multithreaded. The multithreaded cores use a new type of multithreading that we call Subset Static Interleaved SSI multithreading. SSI multithreading combines the ...
Parallelism via Multithreaded and Multicore CPUs

Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...
Enabling an OpenCL Compiler for Embedded Multicore DSP Systems
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

OpenCL is an industry's attempt to unify heterogeneous multicore programming. With its programming model defining SPMD kernels, vector types, and address space qualifiers, OpenCL allows programmers to exploit data parallelism with multicore processors ...

Comments

Information & Contributors

Information

Published In

cover image Guide books

Transactions on high-performance embedded architectures and compilers III

January 2011

300 pages

ISBN:9783642194474

Editor:
Per Stenström
Chalmers University of Technology, Department of Computer Science and Engineering, Gothenburg, Sweden

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2011

Qualifiers

Chapter

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YLi KLi K(2017)Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops ApplicationsInternational Journal of Parallel Programming10.1007/s10766-016-0445-245:4(827-852)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s10766-016-0445-2
Belviranli MChou CBhuyan LGupta RYildirim EBalman MYildirim E(2014)A paradigm shift in GP-GPU computingProceedings of the sixth international workshop on Data intensive distributed computing10.1145/2608020.2608024(29-34)Online publication date: 23-Jun-2014
https://dl.acm.org/doi/10.1145/2608020.2608024

View Options

View options

Figures

Tables

Media

View Table of Conten