Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1980776.1980787guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

A multithreaded multicore system for embedded media processing

Published: 01 January 2011 Publication History

Abstract

We describe a multicore system targeting media processing applications where the cores are multithreaded. The multithreaded cores use a new type of multithreading that we call Subset Static Interleaved (SSI) multithreading. SSI multithreading combines the advantages of blocked multithreading and a simple form of interleaved multithreading called static interleaved multithreading. SSI multithreading divides threads into foreground and background threads and performs static interleaving among the foreground threads. A foreground thread is swapped with a runnable background thread whenever the foreground thread is stalled. SSI multithreading achieves reduced operation latencies, memory latency tolerance, fast context switching, and compared to traditional dynamic interleaving, a relatively low design complexity of the register file.
We use a task scheduling unit (TSU) to dispatch tasks to the cores. The TSU is aware of the fact that the cores are multithreaded. This makes a more efficient mapping of tasks to cores possible by scheduling tasks on the least loaded cores.
We evaluate the system on an optimized Super HD H.264 decoder where the macroblock decoding and deblocking has been parallelized. The complexity of the H.264 standard and the high resolution makes this a challenging and performance demanding application. We achieve speedups of up to 17.7 times for 16 cores with four threads per core relative to a single-threaded single core. Furthermore, the proposed SSI multithreading achieves a speedup of 1.52 times relative to no multithreading, while blocked multithreading achieves only 1.38 times and a restricted form of interleaved multithreading achieves only 1.37 times speedup.

References

[1]
van de Waerdt, J.W., Vassiliadis, S., Das, S., Mirolo, S., Yen, C., Zhong, B., Basto, C., van Itegem, J.P., Amirtharaj, D., Kalra, K., Rodriguez, P., van Antwerpen, H.: The TM3270 Media-Processor. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 331-342. IEEE Computer Society, Los Alamitos (2005).
[2]
Ungerer, T., Robič, B., Šilc, J.: A Survey of Processors with Explicit Multithreading. ACM Comput. Surv. 35(1), 29-63 (2003).
[3]
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous Multithreading: Maximizing On-chip Parallelism. In: ISCA 1995: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392-403. ACM Press, New York (1995).
[4]
Keckler, S.W., Dally, W.J.: Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism. In: ISCA 1992: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 202-213. ACM Press, New York (1992).
[5]
Özer, E., Conte, T.M., Sharma, S.: Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds.) HiPC 2001. LNCS, vol. 2228, pp. 1520-6149. Springer, Heidelberg (2001).
[6]
Jouppi, N.P., Wall, D.W.: Available Instruction-level Parallelism for Superscalar and Superpipelined Machines. In: ASPLOS-III: Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 272-282. ACM Press, New York (1989).
[7]
Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: Architectural Support for Fine-grained Parallelism on Chip Multiprocessors. In: ISCA 2007: Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 162-173. ACM Press, New York (2007).
[8]
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An Efficient Multithreaded Runtime System. In: PPOPP 1995: Proceedings of the fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 207-216. ACM Press, New York (1995).
[9]
Wiegand, T., Sullivan, G.J., Bjntegaard, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. Circuits Syst. Video Techn. 13(7), 560-576 (2003).
[10]
Richardson, I.E.: H.264 and MPEG-4 Video Compresson. John Wiley and Sons, Chichester (2003).
[11]
Sci-Worx: MSVD-HD, Multi-Standard High Definition Video Decoder (2006), www.sci-worx.com
[12]
Chen, J.W., Lin, Y.L.: A High-Performance Hardwired CABAC Decoder. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Santa Clara, California, United States, pp. 1520-6149 (2007).
[13]
van der Tol, E.B., Jaspers, E.G., Gelderblom, R.H.: Mapping of H.264 Decoding on a Multiprocessor Architecture. In: Image and Video Communications and Processing, Santa Clara, California, United States, pp. 707-718 (2003).
[14]
van de Waerdt, J.W., Vassiliadis, S., van Itegem, J.P., van Antwerpen, H.: The TM3270 Media-Processor Data Cache. In: Proceedings of the IEEE International Conference on Computer Design, pp. 334-341 (2005).
[15]
Borkenhagen, J., Eickemeyer, R., Kala, R., Kunkel, S.: A Multithreaded PowerPC Processor for Commercial Servers. IBM Journal of Research Development 44(6), 885-898 (2000).
[16]
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: ISCA 1995: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 24- 36. ACM Press, New York (1995).
[17]
Zuberek, W.M.: Performance Analysis of Enhanced Fine-Grain Multithreaded Distributed-Memory Systems. In: Proc. IEEE Conference on Systems, Man, and Cybernetics, Tucson, Arizona, United States, pp. 1101-1106 (2001).
[18]
Tune, E., Kumar, R., Tullsen, D.M., Calder, B.: Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy. In: MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 183-194. IEEE Computer Society, Los Alamitos (2004).
[19]
Schulte, M., Glossner, J., Jinturkar, S., Moudgill, M., Mamidi, S., Vassiliadis, S.: A Low-Power Multithreaded Processor for Software Defined Radio. J. VLSI Signal Process. Syst. 43(2-3), 143-159 (2006).
[20]
Hansen, C.: Micro Unity's Media Processor Architecture. IEEE Micro 16(4), 34-41 (1996).
[21]
Ramadurai, V., Jinturkar, S., Moudgill, M., Glossner, J.: Multithreading H.264 Decoder on Sandblaster DSP. In: Proceedings at the 2005 Global Signal Processing Expo (GSPx) and International Signal Processing Conference (ISPC), Santa Clara, California (2005).
[22]
Bilas, A., Fritts, J., Singh, J.P.: Real-Time Parallel MPEG-2 Decoding in Software. In: IPPS 1997: Proceedings of the 11th International Symposium on Parallel Processing, Washington, DC, USA, pp. 197-203. IEEE Computer Society, Los Alamitos (1997).

Cited By

View all
  • (2017)Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops ApplicationsInternational Journal of Parallel Programming10.1007/s10766-016-0445-245:4(827-852)Online publication date: 1-Aug-2017
  • (2014)A paradigm shift in GP-GPU computingProceedings of the sixth international workshop on Data intensive distributed computing10.1145/2608020.2608024(29-34)Online publication date: 23-Jun-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide books
Transactions on high-performance embedded architectures and compilers III
January 2011
300 pages
ISBN:9783642194474
  • Editor:
  • Per Stenström

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2011

Qualifiers

  • Chapter

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops ApplicationsInternational Journal of Parallel Programming10.1007/s10766-016-0445-245:4(827-852)Online publication date: 1-Aug-2017
  • (2014)A paradigm shift in GP-GPU computingProceedings of the sixth international workshop on Data intensive distributed computing10.1145/2608020.2608024(29-34)Online publication date: 23-Jun-2014

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media