research-article

Communication optimal parallel multiplication of sparse random matrices

Authors:

Benjamin Lipshitz,

Sivan ToledoAuthors Info & Claims

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Pages 222 - 231

https://doi.org/10.1145/2486159.2486196

Published: 23 July 2013 Publication History

Abstract

Parallel algorithms for sparse matrix-matrix multiplication typically spend most of their time on inter-processor communication rather than on computation, and hardware trends predict the relative cost of communication will only increase. Thus, sparse matrix multiplication algorithms must minimize communication costs in order to scale to large processor counts.

In this paper, we consider multiplying sparse matrices corresponding to Erdős-Rényi random graphs on distributed-memory parallel machines. We prove a new lower bound on the expected communication cost for a wide class of algorithms. Our analysis of existing algorithms shows that, while some are optimal for a limited range of matrix density and number of processors, none is optimal in general. We obtain two new parallel algorithms and prove that they match the expected communication cost lower bound, and hence they are optimal.

References

[1]

R. Agarwal, S. Balle, F. Gustavson, M. Joshi, and P. Palkar. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development, 39(5):575--582, September 1995.

Digital Library

[2]

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 77--79, New York, NY, USA, 2012. ACM.

Digital Library

[3]

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 193--204, New York, NY, USA, 2012. ACM.

Digital Library

[4]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM. J. Matrix Anal. & Appl, 32:pp. 866--901, 2011.

[5]

W. Briggs, V. Henson, and S. McCormick. A Multigrid Tutorial: Second Edition. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

Digital Library

[6]

J. Bruck, C.-T. Ho, S. Kipnis, and D. Weathersby. Efficient algorithms for all-to-all communications in multi-port message-passing systems. In Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, SPAA '94, pages 298--309, New York, NY, USA, 1994. ACM.

Digital Library

[7]

A. Buluç and J. Gilbert. Challenges and advances in parallel sparse matrix-matrix multiplication. In ICPP'08: Proc. of the Intl. Conf. on Parallel Processing, pages 503--510, Portland, Oregon, USA, 2008. IEEE Computer Society.

Digital Library

[8]

A. Buluç and J. Gilbert. The Combinatorial BLAS: Design, implementation, and applications. Int. J. High Perform. Comput. Appl., 25(4):496--509, November 2011.

Digital Library

[9]

A. Buluç and J. Gilbert. Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. SIAM Journal of Scientific Computing (SISC), 34(4):170--191, 2012.

[10]

A. Campagna, K. Kutzkov, and R. Pagh. On parallelizing matrix multiplication by the column-row method. arXiv preprint arXiv:1210.0461, 2012.

[11]

L. Cannon. A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University, Bozeman, MN, 1969.

Digital Library

[12]

M. Challacombe. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory. Computer physics communications, 128(1-2):93--107, 2000.

[13]

J. Demmel, D. Eliahu, A. Fox, S. Kamil, B. Lipshitz, O. Schwartz, and O. Spillinger. Communication-optimal parallel recursive rectangular matrix multiplication. In International Parallel & Distributed Processing Symposium (IPDPS). IEEE, 2013.

Digital Library

[14]

Paul Erdőos and Alfréd Rényi. On random graphs. Publicationes Mathematicae Debrecen, 6:290--297, 1959.

[15]

J. Gilbert, C. Moler, and R. Schreiber. Sparse matrices in Matlab: Design and implementation. SIAM Journal of Matrix Analysis and Applications, 13(1):333--356, 1992.

Digital Library

[16]

J. Gilbert, S. Reinhardt, and V. Shah. A unified framework for numerical and combinatorial computing. Computing in Science and Engineering, 10(2):20--25, 2008.

Digital Library

[17]

L. Grigori, P.-Y. David, J. Demmel, and S. Peyronnet. Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, pages 79--81, New York, NY, USA, 2010. ACM.

Digital Library

[18]

J. Gunnels, C. Lin, G. Morrow, and R. van de Geijn. A flexible class of parallel matrix multiplication algorithms. In Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing 1998, pages 110--116. IEEE, 1998.

Digital Library

[19]

F. Gustavson. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software, 4(3):250--269, 1978.

Digital Library

[20]

C. Kruskal, L. Rudolph, and M. Snir. Techniques for parallel manipulation of sparse matrices. Theor. Comput. Sci., 64(2):135--157, 1989.

Digital Library

[21]

L. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the AMS, 55:961--962, 1949.

[22]

W. McColl and A. Tiskin. Memory-efficient matrix multiplication in the BSP model. Algorithmica, 24:287--297, 1999.

[23]

G. Penn. Efficient transitive closure of sparse matrices over closed semirings. Theoretical Computer Science, 354(1):72--81, 2006.

Digital Library

[24]

A. Pietracaprina, G. Pucci, M. Riondato, F. Silvestri, and E. Upfal. Space-round tradeoffs for mapreduce computations. In Proceedings of the 26th ACM International Conference on Supercomputing, pages 235--244. ACM, 2012.

Digital Library

[25]

M. Schatz, J. Poulson, and R. van de Geijn. Parallel matrix multiplication: 2d and 3d, FLAME Working Note#62. Technical Report TR-12-13, The University of Texas at Austin, Department of Computer Sciences, June 2012.

[26]

E. Solomonik, A. Bhatele, and J. Demmel. Improving communication performance in dense linear algebra via topology aware collectives. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 77. ACM, 2011.

Digital Library

[27]

E. Solomonik and J. Demmel. Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In Euro-Par'11: Proceedings of the 17th International European Conference on Parallel and Distributed Computing. Springer, 2011.

Digital Library

[28]

R. van de Geijn and J. Watts. SUMMA: Scalable universal matrix multiplication algorithm. Concurrency - Practice and Experience, 9(4):255--274, 1997.

[29]

S. Van Dongen. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications, 30(1):121--141, 2008.

Digital Library

[30]

J. VandeVondele, U. Borštnik, and J. Hutter. Linear scaling self-consistent field calculations with millions of atoms in the condensed phase. Journal of Chemical Theory and Computation, 8(10):3565--3573, 2012.

[31]

R. Yuster and U. Zwick. Fast sparse matrix multiplication. ACM Transactions on Algorithms, 1(1):2--13, 2005.

Digital Library

Cited By

Zheng ZChen JZhao YSong LQin XAn H(2024)DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT CalculationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673159(1156-1165)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673159
Lin CLuo WFang YMa CLiu XMa Y(2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654959
Li YJian CZang GSong CYuan X(2024)Node classification oriented Adaptive Multichannel Heterogeneous Graph Neural NetworkKnowledge-Based Systems10.1016/j.knosys.2024.111618292:COnline publication date: 23-May-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.111618
Show More Cited By

Index Terms

Communication optimal parallel multiplication of sparse random matrices
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Communication-optimal parallel algorithm for strassen's matrix multiplication
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The ...
Matrix Multiplication I/O-Complexity by Path Routing
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

We apply a novel technique based on path routings to obtain optimal I/O-complexity lower bounds for all Strassen-like fast matrix multiplication algorithms computed in serial or in parallel, assuming no reuse of nontrivial intermediate linear ...
Parallel Algorithms for Sparse Matrix Multiplication and Join-Aggregate Queries
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

In this paper, we design massively parallel algorithms for sparse matrix multiplication, as well as more general join-aggregate queries, where the join hypergraph is a tree with arbitrary output attributes. For each case, we obtain asymptotic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

July 2013

348 pages

ISBN:9781450315722

DOI:10.1145/2486159

General Chair:
Guy Blelloch
Carnegie Mellon University, USA
,
Program Chair:
Berthold Vöcking
RWTH Aachen University, Germany

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA '13

Sponsor:

SPAA '13: 25th ACM Symposium on Parallelism in Algorithms and Architectures

July 23 - 25, 2013

Québec, Montréal, Canada

Acceptance Rates

SPAA '13 Paper Acceptance Rate 31 of 130 submissions, 24%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
493
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)6

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng ZChen JZhao YSong LQin XAn H(2024)DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT CalculationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673159(1156-1165)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673159
Lin CLuo WFang YMa CLiu XMa Y(2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654959
Li YJian CZang GSong CYuan X(2024)Node classification oriented Adaptive Multichannel Heterogeneous Graph Neural NetworkKnowledge-Based Systems10.1016/j.knosys.2024.111618292:COnline publication date: 23-May-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.111618
Gao JJi WChang FHan SWei BLiu ZWang Y(2023)A Systematic Survey of General Sparse Matrix-matrix MultiplicationACM Computing Surveys10.1145/357115755:12(1-36)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3571157
Ju CZhang YSolomonik E(2023)Communication Lower Bounds for Nested Bilinear Algorithms via Rank Expansion of Kronecker ProductsFoundations of Computational Mathematics10.1007/s10208-023-09633-8Online publication date: 6-Nov-2023
https://doi.org/10.1007/s10208-023-09633-8
Nissim RSchwartz O(2023)Stragglers in Distributed Matrix MultiplicationJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-43943-8_4(74-96)Online publication date: 15-Sep-2023
https://doi.org/10.1007/978-3-031-43943-8_4
Zeng YWu JZhang JRen YZhang Y(2022)Trinity: Neural Network Adaptive Distributed Parallel Training Method Based on Reinforcement LearningAlgorithms10.3390/a1504010815:4(108)Online publication date: 24-Mar-2022
https://doi.org/10.3390/a15040108
Niu YLu ZJi HSong SJin ZLiu WLee JAgrawal KSpear M(2022)TileSpGEMMProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508431(90-106)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508431
Rahman MAgrawal AAzad A(2022)MarkovGNN: Graph Neural Networks on Markov DiffusionCompanion Proceedings of the Web Conference 202210.1145/3487553.3524713(1019-1029)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524713
Singh NZhang ZWu XZhang NZhang SSolomonik E(2022)Distributed-memory tensor completion for generalized loss functions in python using new sparse tensor kernelsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.07.005169(269-285)Online publication date: Nov-2022
https://doi.org/10.1016/j.jpdc.2022.07.005
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents