Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2020
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 22, Pages 1–45https://doi.org/10.1145/3418075Sparse matrix-vector multiplication (SpMV) operations are commonly used in various scientific and engineering applications. The performance of the SpMV operation often depends on exploiting regularity patterns in the matrix. Various representations and ...
- research-articleMarch 2020
Load-balancing Sparse Matrix Vector Product Kernels on GPUs
- Hartwig Anzt,
- Terry Cojean,
- Chen Yen-Chen,
- Jack Dongarra,
- Goran Flegar,
- Pratik Nayak,
- Stanimire Tomov,
- Yuhsiang M. Tsai,
- Weichung Wang
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 1Article No.: 2, Pages 1–26https://doi.org/10.1145/3380930Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that ...
- research-articleMarch 2020
Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 1Article No.: 4, Pages 1–19https://doi.org/10.1145/3380934We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. Our variable-precision ...
- research-articleJanuary 2018
Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication
ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 3Article No.: 13, Pages 1–34https://doi.org/10.1145/3155292We investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a ...
- research-articleJanuary 2017
Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations
ACM Transactions on Parallel Computing (TOPC), Volume 3, Issue 1Article No.: 3, Pages 1–47https://doi.org/10.1145/2897188This article derives trade-offs between three basic costs of a parallel algorithm: synchronization, data movement, and computational cost. These trade-offs are lower bounds on the execution time of the algorithm that are independent of the number of ...
- research-articleDecember 2016
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication
ACM Transactions on Parallel Computing (TOPC), Volume 3, Issue 3Article No.: 18, Pages 1–34https://doi.org/10.1145/3015144We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the ...
- research-articleSeptember 2015
Work-Efficient Matrix Inversion in Polylogarithmic Time
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 3Article No.: 15, Pages 1–29https://doi.org/10.1145/2809812We present an algorithm for inversion of symmetric positive definite matrices that combines the practical requirement of an optimal number of arithmetic operations and the theoretical goal of a polylogarithmic critical path length. The algorithm reduces ...
- research-articleApril 2015
Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution Rates
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 1Article No.: 7, Pages 1–33https://doi.org/10.1145/2742351Next-generation HPC computing platforms are likely to be characterized by significant, unpredictable nonuniformities in execution time among compute nodes and cores. The resulting load imbalances from this nonuniformity are expected to arise from a ...
- research-articleFebruary 2015
Avoiding Communication in Successive Band Reduction
ACM Transactions on Parallel Computing (TOPC), Volume 1, Issue 2Article No.: 11, Pages 1–37https://doi.org/10.1145/2686877The running time of an algorithm depends on both arithmetic and communication (i.e., data movement) costs, and the relative costs of communication are growing over time. In this work, we present sequential and distributed-memory parallel algorithms for ...