CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

Werth, Tobias; Schreier, Silvia; Philippsen, Michael

doi:10.1007/978-3-642-36036-7_7

Tobias Werth¹⁷,
Silvia Schreier¹⁸ &
Michael Philippsen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7146))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

952 Accesses

Abstract

The potential of heterogeneous multicores, like the Cell BE, can only be exploited if the host and the accelerator cores are used in parallel and if the specific features of the cores are considered. Parallel programming, especially when applied to irregular task-parallel problems, is challenging itself. However, heterogeneous multicores add to that complexity due to their memory hierarchy and specialized accelerators. As a solution for these issues we present CellCilk, a prototype implementation of Cilk for heterogeneous multicores with a host/accelerator design, using the Cell BE in particular. CellCilk introduces a new keyword (spu_spawn) for task creation on the accelerator cores. Task scheduling and load balancing are done by a novel dynamic cross-hierarchy work-stealing regime. Furthermore, the CellCilk runtime employs a garbage collection mechanism for distributed data structures that are created during scheduling. On benchmarks we achieve a good speedup and reasonable runtimes, even when compared to manually parallelized codes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Celerity: High-Level C++ for Accelerator Clusters

Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP

A taxonomy of task-based parallel programming technologies for high-performance computing

Article Open access 12 January 2018

References

Bellens, P., Pérez, J.M., Cabarcas, F., Ramírez, A., Badia, R.M., Labarta, J.: CellSs: Scheduling techniques to better exploit memory hierarchy. Scientific Programming 17(1-2), 77–95 (2009)
Google Scholar
Blumofe, R.D., Frigo, M., Joerg, C.F., Leiserson, C.E., Randall, K.H.: An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms. In: SPAA 1996: Proc. Symp. Parallel Algorithms and Architectures, Padua, Italy, pp. 297–308 (June 1996)
Google Scholar
Cao, Q., Hu, C., He, H., Huang, X., Li, S.: Support for OpenMP Tasks on Cell Architecture. In: Hsu, C.-H., Yang, L.T., Park, J.H., Yeo, S.-S. (eds.) ICA3PP 2010, Part II. LNCS, vol. 6082, pp. 308–317. Springer, Heidelberg (2010)
Chapter Google Scholar
Cooper, P., Dolinsky, U., Donaldson, A.F., Richards, A., Riley, C., Russell, G.: Offload – Automating Code Migration to Heterogeneous Multicore Systems. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 337–352. Springer, Heidelberg (2010)
Chapter Google Scholar
Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: SC 2009: Proc. Conf. High Performance Computing Networking, Storage and Analysis, Portland, OR, pp. 53:1–53:11. ACM (November 2009)
Google Scholar
Duff, T.: Duff’s device. Usenet posting (November 1983), http://www.lysator.liu.se/c/duffs-device.html
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: PLDI 1998: Proc. Conf. Programming Language Design and Impl., Montreal, Canada, pp. 212–223 (June 1998)
Google Scholar
Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: ICS 2005: Proc. Intl. Conf. Supercomputing, Cambridge, MA, pp. 361–366 (June 2005)
Google Scholar
Frigo, M., Strumpen, V.: The cache complexity of multithreaded cache oblivious algorithms. In: SPAA 2006: Proc. Symp. Parallel Algorithms and Architectures, Cambridge, MA, pp. 271–280 (July 2006)
Google Scholar
Hackenberg, D.: Fast Matrix Multiplication on Cell (SMP) Systems (2009), http://www.tu-dresden.de/zih/cell/matmul
Jones, R., Lins, R.: Garbage Collection: Algorithms for Automatic Dynamic Memory Management. Wiley & Sons (1996)
Google Scholar
Leiserson, C.E.: Programming Irregular Parallel Applications in Cilk. In: Lüling, R., Bilardi, G., Ferreira, A., Rolim, J.D.P. (eds.) IRREGULAR 1997. LNCS, vol. 1253, pp. 61–71. Springer, Heidelberg (1997)
Chapter Google Scholar
Mendes, R., Whately, L., de Castro, M.C.S., Bentes, C., de Amorim, C.L.: Runtime System Support for Running Applications with Dynamic and Asynchronous Task Parallelism in Software DSM Systems. In: SBAC-PAD 2006: Symp. Computer Architecture and High Performance Computing, Ouro Preto, Brasil, pp. 159–166 (October 2006)
Google Scholar
O’Brien, K., O’Brien, K., Sura, Z., Chen, T., Zhang, T.: Supporting OpenMP on Cell. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 65–76. Springer, Heidelberg (2008)
Chapter Google Scholar
Peng, L., Wong, W.F., Yuen, C.K.: The Performance Model of SilkRoad - A Multithreaded DSM System for Clusters. In: CCGRID 2003: Intl. Symp. Cluster Computing and the Grid, Tokyo, Japan, pp. 495–501 (May 2003)
Google Scholar
Randall, K.H.: Cilk: Efficient Multithreaded Computing. Ph.D. thesis, Massachusetts Institute of Technology (June 1998)
Google Scholar
Seo, S., Lee, J., Sura, Z.: Design and implementation of software-managed caches for multicores with local memory. In: HPCA 2009: Intl. Conf. High-Performance Computer Architecture, Raleigh, NC, pp. 55–66. IEEE (February 2009)
Google Scholar
Werth, T., Floßmann, T., Klemm, M., Schell, D., Weigand, U., Philippsen, M.: Dynamic Code Footprint Optimization for the IBM Cell Broadband Engine. In: IWMSE 2009: Proc. ICSE Workshop on Multicore Software Engineering, Vancouver, Canada, pp. 64–72 (May 2009)
Google Scholar
Zeiser, T., Wellein, G., Iglberger, K., Nitsure, A., Rüde, U., Hager, G.: Introducing a parallel cache oblivious blocking approach for the Lattice Boltzmann Method. Progress in Computational Fluid Dynamics 8(1-4), 179–188 (2008)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Programming Systems Group, University of Erlangen-Nuremberg, Germany
Tobias Werth & Michael Philippsen
Faculty of Mathematics and Computer Science, Data Processing Technology, University of Hagen, Germany
Silvia Schreier

Authors

Tobias Werth
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Schreier
View author publications
You can also search for this author in PubMed Google Scholar
Michael Philippsen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Colorado State University, 80523-1873, Fort Collins, CO, USA
Sanjay Rajopadhye & Michelle Mills Strout &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Werth, T., Schreier, S., Philippsen, M. (2013). CellCilk: Extending Cilk for Heterogeneous Multicore Platforms. In: Rajopadhye, S., Mills Strout, M. (eds) Languages and Compilers for Parallel Computing. LCPC 2011. Lecture Notes in Computer Science, vol 7146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36036-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-36036-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36035-0
Online ISBN: 978-3-642-36036-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Celerity: High-Level C++ for Accelerator Clusters

Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP

A taxonomy of task-based parallel programming technologies for high-performance computing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Celerity: High-Level C++ for Accelerator Clusters

Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP

A taxonomy of task-based parallel programming technologies for high-performance computing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation