Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- posterSeptember 2016
POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel Systems
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 443–445https://doi.org/10.1145/2967938.2976039Programming heterogeneous parallel systems can be extremely complex because a single system may include multiple different parallelism models, instruction sets, and memory hierarchies, and different systems use different combinations of these features. ...
- posterSeptember 2016
POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware
- Kallia Chronaki,
- Miquel Moretó,
- Marc Casas,
- Alejandro Rico,
- Rosa M. Badia,
- Eduard Ayguadé,
- Jesus Labarta,
- Mateo Valero
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 415–417https://doi.org/10.1145/2967938.2976038Energy efficiency has become the main challenge for high performance computing (HPC). The use of mobile asymmetric multi-core architectures to build future multi-core systems is an approach towards energy savings while keeping high performance. However, ...
- posterSeptember 2016
POSTER: Hybrid Data Dependence Analysis for Loop Transformations
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 439–440https://doi.org/10.1145/2967938.2974059Loop optimizations span from vectorization, scalar promotion, loop invariant code motion, software pipelining to loop fusion, skewing, tiling and loop parallelization. These transformations are essential in the quest for automated high-performance code ...
- posterSeptember 2016
POSTER: Collective Dynamic Parallelism for Directive Based GPU Programming Languages and Compilers
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 423–424https://doi.org/10.1145/2967938.2974056Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or ...
- posterSeptember 2016
POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 449–450https://doi.org/10.1145/2967938.2974055Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains ...
-
- posterSeptember 2016
POSTER: An Optimization of Dataflow Architectures for Scientific Applications
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 441–442https://doi.org/10.1145/2967938.2974054Dataflow computing is proved to be promising in high-performance computing. However, traditional dataflow architectures are general-purpose and not efficient enough when dealing with typical scientific applications due to low utilization of function ...
- abstractSeptember 2016
Student Research Poster: Software Out-of-Order Execution for In-Order Architectures
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPage 458https://doi.org/10.1145/2967938.2971466Processor cores are divided into two categories: fast and power-hungry out-of-order processors, and efficient, but slower in-order processors. To achieve high performance with low-energy budgets, this proposal aims to deliver out-of-order processing by ...
- research-articleSeptember 2016
A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 327–338https://doi.org/10.1145/2967938.2967969This paper describes an automatic approach to accelerate image processing pipelines using FPGAs. An image processing pipeline can be viewed as a graph of interconnected stages that processes images successively. Each stage typically performs a point-...
- research-articleSeptember 2016
A Static Cut-off for Task Parallel Programs
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 139–150https://doi.org/10.1145/2967938.2967968Task parallel models supporting dynamic and hierarchical parallelism are believed to offer a promising direction to achieving higher performance and programmability. Divide-and-conquer is the most frequently used idiom in task parallel models, which ...
- research-articleSeptember 2016
Resource Conscious Reuse-Driven Tiling for GPUs
- Prashant Singh Rawat,
- Changwan Hong,
- Mahesh Ravishankar,
- Vinod Grover,
- Louis-Noel Pouchet,
- Atanas Rountev,
- P. Sadayappan
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 99–111https://doi.org/10.1145/2967938.2967967Computations involving successive application of 3D stencil operators are widely used in many application domains, such as image processing, computational electromagnetics, seismic processing, and climate modeling. Enhancement of temporal and spatial ...
- research-articleSeptember 2016
Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling
- Paul Caheny,
- Marc Casas,
- Miquel Moretó,
- Hervé Gloaguen,
- Maxime Saintes,
- Eduard Ayguadé,
- Jesús Labarta,
- Mateo Valero
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 275–286https://doi.org/10.1145/2967938.2967962Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA ...
- research-articleSeptember 2016
Speculatively Exploiting Cross-Invocation Parallelism
- Jialu Huang,
- Prakash Prabhu,
- Thomas B. Jablin,
- Soumyadeep Ghosh,
- Sotiris Apostolakis,
- Jae W. Lee,
- David I. August
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 207–221https://doi.org/10.1145/2967938.2967959Automatic parallelization has shown promise in producing scalable multi-threaded programs for multi-core architectures. Most existing automatic techniques parallelize independent loops and insert global synchronization between loop invocations. For ...
- research-articleSeptember 2016
Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 341–352https://doi.org/10.1145/2967938.2967952Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy between the CPU and GPU, and kernel execution. While GPU can remain idle during I/O and memory copy, prior work has shown that overlapping data movement ...
- research-articleSeptember 2016
Reduction Drawing: Language Constructs and Polyhedral Compilation for Reductions on GPU
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 87–97https://doi.org/10.1145/2967938.2967950Reductions are common in scientific and data-crunching codes, and a typical source of bottlenecks on massively parallel architectures such as GPUs. Reductions are memory-bound, and achieving peak performance involves sophisticated optimizations. There ...
- research-articleSeptember 2016
Hash Map Inlining
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 235–246https://doi.org/10.1145/2967938.2967949Scripting languages like Javascript and PHP are widely used to implement application logic for dynamically-generated web pages. Their popularity is due in large part to their flexible syntax and dynamic type system, which enable rapid turnaround time ...
- research-articleSeptember 2016
Optimizing Indirect Memory References with milk
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 299–312https://doi.org/10.1145/2967938.2967948Modern applications such as graph and data analytics, when operating on real world data, have working sets much larger than cache capacity and are bottlenecked by DRAM. To make matters worse, DRAM bandwidth is increasing much slower than per CPU core ...
- research-articleSeptember 2016
Fusion of Parallel Array Operations
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 71–85https://doi.org/10.1145/2967938.2967945We address the problem of fusing array operations based on criteria such as shape compatibility, data reuse, and minimizing for data reuse, the fusion problem has been formulated as a static weighted graph partitioning problem (known as the Weighted ...
- research-articleSeptember 2016
Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data Processing: Think Big, See Small
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 315–326https://doi.org/10.1145/2967938.2967944Convolutional Neural Networks (CNNs) have substantially advanced the state-of-the-art accuracies of object recognition, which is the core function of a myriad of modern multimedia processing techniques such as image/video processing, speech recognition, ...
- research-articleSeptember 2016
Sparso: Context-driven Optimizations of Sparse Linear Algebra
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 247–259https://doi.org/10.1145/2967938.2967943The sparse matrix is a key data structure in various domains such as high-performance computing, machine learning, and graph analytics. To maximize performance of sparse matrix operations, it is especially important to optimize across the operations and ...
- proceedingSeptember 2016
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
The International Conference on Parallel Architectures and Compilation Techniques (PACT) started as a Data Flow Workshop in conjunction with the ISCA 1989 in Israel but has quickly evolved into a unique venue at the intersection of parallel architecture ...