Dandelion: a compiler and runtime for heterogeneous systems

CJ Rossbach, Y Yu, J Currey, JP Martin… - Proceedings of the …, 2013 - dl.acm.org
CJ Rossbach, Y Yu, J Currey, JP Martin, D Fetterly
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013dl.acm.org
Computer systems increasingly rely on heterogeneity to achieve greater performance,
scalability and energy efficiency. Because heterogeneous systems typically comprise
multiple execution contexts with different programming abstractions and runtimes,
programming them remains extremely challenging. Dandelion is a system designed to
address this programmability challenge for data-parallel applications. Dandelion provides a
unified programming model for heterogeneous systems that span diverse execution contexts …
Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging.
Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span diverse execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F#. It therefore provides an expressive data model and native language integration for user-defined functions, enabling programmers to write applications using standard high-level languages and development tools.
Dandelion automatically and transparently distributes data-parallel portions of a program to available computing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution. To enable automatic execution of .NET code on GPUs, Dandelion cross-compiles .NET code to CUDA kernels and uses the PTask runtime [85] to manage GPU execution. This paper discusses the design and implementation of Dandelion, focusing on the distributed CPU and GPU implementation. We evaluate the system using a diverse set of workloads.
ACM Digital Library