abstract

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds

Authors:

Oded SchwartzAuthors Info & Claims

SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Pages 77 - 79

https://doi.org/10.1145/2312005.2312021

Published: 25 June 2012 Publication History

Get Access

Abstract

A parallel algorithm has perfect strong scaling if its running time on $P$ processors is linear in $1/P$, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms.

References

[1]

Aggarwal, A., Chandra, A. K., and Snir, M. Communication complexity of PRAMs. Theoretical Computer Science 71, 1 (1990), 3 -- 28.

Digital Library

Google Scholar

[2]

Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., and Schwartz, O. Communication-optimal parallel algorithm for Strassen's matrix multiplication. In SPAA '12: Proceedings of the 24th Annual Symposium on Parallelism in Algorithms and Architectures (New York, NY, USA, 2012), ACM.

Digital Library

Google Scholar

[3]

Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., and Schwartz, O. Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. EECS Technical Report EECS-2012--31, UC Berkeley, Mar. 2012.

Google Scholar

[4]

Ballard, G., Demmel, J., Holtz, O., and Schwartz, O. Graph expansion and communication costs of fast matrix multiplication. In SPAA '11: Proceedings of the 23rd Annual Symposium on Parallelism in Algorithms and Architectures (New York, NY, USA, 2011), ACM, pp. 1--12.

Digital Library

Google Scholar

[5]

Ballard, G., Demmel, J., Holtz, O., and Schwartz, O. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications 32, 3 (2011), 866--901.

Crossref

Google Scholar

[6]

Irony, D., Toledo, S., and Tiskin, A. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64, 9 (2004), 1017--1026.

Digital Library

Google Scholar

[7]

Loomis, L. H., and Whitney, H. An inequality related to the isoperimetric inequality. Bulletin of the AMS 55 (1949), 961--962.

Crossref

Google Scholar

[8]

Solomonik, E., and Demmel, J. Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In Euro-Par '11: Proceedings of the 17th International European Conference on Parallel and Distributed Computing (2011), Springer.

Digital Library

Google Scholar

Cited By

View all

Al Daas HBallard GGrigori LKumar SRouse KAgrawal KShun J(2023)Parallel Memory-Independent Communication Bounds for SYRKProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591072(391-401)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591072
Tang YGao W(2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472506
Tang Y(2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3409390.3409404
Show More Cited By

Index Terms

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds
SPAA '22: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored constant factors or not obtained the tightest possible values. The main result of this work is ...
Communication-optimal parallel algorithm for strassen's matrix multiplication
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The ...
Graph expansion and communication costs of fast matrix multiplication

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication algorithms, ...

Comments

Information & Contributors

Information

Published In

SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

June 2012

348 pages

ISBN:9781450312134

DOI:10.1145/2312005

General Chair:
Guy Blelloch
Carnegie Mellon University, USA
,
Program Chair:
Maurice Herlihy
Brown University, USA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Abstract

Conference

SPAA '12

Sponsor:

SPAA '12: 24th ACM Symposium on Parallelism in Algorithms and Architectures

June 25 - 27, 2012

Pennsylvania, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
267
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Al Daas HBallard GGrigori LKumar SRouse KAgrawal KShun J(2023)Parallel Memory-Independent Communication Bounds for SYRKProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591072(391-401)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591072
Tang YGao W(2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472506
Tang Y(2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3409390.3409404
Tang YScheideler CSpear M(2020)Balanced Partitioning of Several Cache-Oblivious AlgorithmsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400214(575-577)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3350755.3400214
Demirci GAykanat C(2020)Cartesian Partitioning Models for 2D and 3D Parallel SpGEMM AlgorithmsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300070831:12(2763-2775)Online publication date: 1-Dec-2020
https://doi.org/10.1109/TPDS.2020.3000708
Xiao JPeng J(2019)Trade-offs between computation, communication, and synchronization in stencil-collective alternate updateCCF Transactions on High Performance Computing10.1007/s42514-019-00011-x1:2(144-160)Online publication date: 26-Jul-2019
https://doi.org/10.1007/s42514-019-00011-x
Ramanan PAfrati FSroka JYi KHidders J(2018)Six Pass MapReduce Implementation of Strassen's Algorithm for Matrix MultiplicationProceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/3206333.3206336(1-6)Online publication date: 15-Jun-2018
https://dl.acm.org/doi/10.1145/3206333.3206336
Ballard GKnight NRouse K(2018)Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00065(557-567)Online publication date: May-2018
https://doi.org/10.1109/IPDPS.2018.00065
Zhang JZhou X(2018)A parallel algorithm for matrix fast exponentiation based on MPI2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA)10.1109/ICBDA.2018.8367669(162-165)Online publication date: Mar-2018
https://doi.org/10.1109/ICBDA.2018.8367669
Deng MRamanan PAfrati FSroka JKoutris P(2017)MapReduce Implementation of Strassen's Algorithm for Matrix MultiplicationProceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/3070607.3070614(1-10)Online publication date: 14-May-2017
https://dl.acm.org/doi/10.1145/3070607.3070614
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

Communication-optimal parallel algorithm for strassen's matrix multiplication

Graph expansion and communication costs of fast matrix multiplication

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations