Distributed-memory -matrix Algebra I: Data Distribution and Matrix-vector Multiplication

Y Li, J Poulson, L Ying - arXiv preprint arXiv:2008.12441, 2020 - arxiv.org
arXiv preprint arXiv:2008.12441, 2020arxiv.org
We introduce a data distribution scheme for $\mathcal {H} $-matrices and a distributed-
memory algorithm for $\mathcal {H} $-matrix-vector multiplication. Our data distribution
scheme avoids an expensive $\Omega (P^ 2) $ scheduling procedure used in previous
work, where $ P $ is the number of processes, while data balancing is well-preserved.
Based on the data distribution, our distributed-memory algorithm evenly distributes all
computations among $ P $ processes and adopts a novel tree-communication algorithm to …
We introduce a data distribution scheme for -matrices and a distributed-memory algorithm for -matrix-vector multiplication. Our data distribution scheme avoids an expensive scheduling procedure used in previous work, where is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is for -matrices under weak admissibility condition, where is the matrix size, denotes the latency, and denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.
arxiv.org