Issue Downloads
A High Accuracy Preserving Parallel Algorithm for Compact Schemes for DNS
A new accuracy-preserving parallel algorithm employing compact schemes is presented for direct numerical simulation of the Navier-Stokes equations. Here the connotation of accuracy preservation is having the same level of accuracy obtained by the ...
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems
Sparse matrix-vector multiplication (SpMV) operations are commonly used in various scientific and engineering applications. The performance of the SpMV operation often depends on exploiting regularity patterns in the matrix. Various representations and ...
A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays
Languages and libraries based on Partitioned Global Address Space (PGAS) programming models are convenient for exploiting scalable parallelism on large applications across different domains with irregular memory access patterns. OpenSHMEM is a PGAS-...
Programming Strategies for Irregular Algorithms on the Emu Chick
- Eric R. Hein,
- Srinivas Eswar,
- Abdurrahman Yaşar,
- Jiajia Li,
- Jeffrey S. Young,
- Thomas M. Conte,
- Ümit V. Çatalyürek,
- Richard Vuduc,
- Jason Riedy,
- Bora Uçar
The Emu Chick prototype implements migratory memory-side processing in a novel hardware system. Rather than transferring large amounts of data across the system interconnect, the Emu Chick moves lightweight thread contexts to near-memory cores before ...
Toward a Microarchitecture for Efficient Execution of Irregular Applications
Given the increasing importance of efficient data-intensive computing, we find that modern processor designs are not well suited to the irregular memory access patterns often found in these algorithms. Applications and algorithms that do not exhibit ...
Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications
Field Programmable Gate Arrays (FPGAs) are becoming an appealing technology in datacenters and High Performance Computing. High-Level Synthesis (HLS) of multi-threaded parallel programs is increasingly used to extract parallelism. Despite great leaps ...