No abstract available.
Proceeding Downloads
Julia Cloud Matrix Machine: Dynamic Matrix Language Acceleration on Multicore Clusters in the Cloud
- Jay Hwan Lee,
- Yeonsoo Kim,
- Yonghyun Ryu,
- Wasuwee Sodsong,
- Hyunjun Jeon,
- Jinsik Park,
- Bernd Burgstaller,
- Bernhard Scholz
Matrix computations are widely used in increasing sizes and complexity in scientific computing and engineering. But current matrix language implementations lack programmer support to effectively and seamlessly utilize cloud computing resources. We ...
Distributed Cell Set : A Library for Space-Dependent Communication/Computation Overlap on Manycore Cluster
The increase in the number of cores available in modern processors makes it important for implementations to maximize their use within a node by overlapping communication and computation. However, when the dependencies between communication and ...
Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints
This work proposes a novel algorithm and Integer Linear Programming (ILP) formulation to optimize the pipelined code mapping of dataflow graph under a given budget generated by optimizing compilers. The goal of this optimization technique is to ...
Studying the expressiveness and performance of parallelization abstractions for linear pipelines
Semi-automatic parallelization provides abstractions that simplify the programming effort and allow the user to make decisions that cannot be made by tools. However, abstractions for general-purpose systems usually do not carry sufficient knowledge ...
Harmonic CUDA: Asynchronous Programming on GPUs
We introduce Harmonic CUDA, a dataflow programming model for GPUs that allows programmers to describe algorithms as a dependency graph of producers and consumers where data flows continuously through the graph for the duration of the kernel. This ...
MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation
MPI+X is the most popular hybrid programming model for distributed computation on modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally would like to implement multi-node distributed parallel computing through a single ...
Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks
Computing on heterogeneous architecture involving CPUs and accelerators is now a popular choice of parallel computing. As a directive-based programming model, OpenMP has become more and more comprehensive that supports a large variety of hardware ...