research-article

Matrix Multiplication I/O-Complexity by Path Routing

Authors:

Oded SchwartzAuthors Info & Claims

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

Pages 35 - 45

https://doi.org/10.1145/2755573.2755594

Published: 13 June 2015 Publication History

Abstract

We apply a novel technique based on path routings to obtain optimal I/O-complexity lower bounds for all Strassen-like fast matrix multiplication algorithms computed in serial or in parallel, assuming no reuse of nontrivial intermediate linear combinations. Given fast memory of size M, we prove an I/O-complexity lower bound of Ω((n/√M}^ω0 • M) for any Strassen-like matrix multiplication algorithm applied to n x n matrices of arithmetic complexity Θ(n^ω0) with ω0<3 under this assumption. This generalizes an approach by Ballard, Demmel, Holtz, and Schwartz that provides a tight lower bound for Strassen's matrix multiplication algorithm but which does not apply to algorithms with disconnected encoding or decoding components of the underlying computation graph or algorithms with multiply copied values. We overcome these challenges via a new graph-theoretical approach for proving I/O-complexity lower bounds without the use of edge expansions.

References

[1]

G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica, 23:1--155, 5 2014.

[2]

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proc. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), SPAA '12, pages 77--79, New York, NY, USA, 2012. ACM.

Digital Library

[3]

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, 2012.

Digital Library

[4]

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. Design and Analysis of Algorithms, 7659:13--36, 2012.

Digital Library

[5]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. & Appl., 32(3):866--901, 2011.

[6]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. Journal of the ACM, 59(6), 2012.

Digital Library

[7]

G. Bilardi, A. Pietracaprina, and P. D'Alberto. On the space and access complexity of computation dags. Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science, London, UK, pages 47--58, 2000.

Digital Library

[8]

D. Bini, M. Capovani, F. Romani, and G. Lotti. o(n^2.7799) complexity for $ntimes n$ approximate matrix multiplication. Information processing letters, 8(5):234--235, 1979.

[9]

M. Christ, J. Demmel, N. Knight, T. Scanlon, and K. A. Yelick. Communication lower bounds and optimal algorithms for programs that reference arrays - part 1. Technical Report UCB/EECS-2013--61, EECS Department, University of California, Berkeley, May 2013.

[10]

J. W. Hong and H. T. Kung. The red-blue pebble game. STOC 1981: Proceedings of the thirteenth annual ACM symposium on theory of computing, pages 326--333, 1981.

Digital Library

[11]

J. Hopcroft and L. Kerr. On minimizing the number of multiplications necessary for matrix multiplication. SIAM Journal on Applied Mathematics, 20(1):30--36, 1971.

[12]

D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.

Digital Library

[13]

L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society, 55(10), 1949.

[14]

J. Savage. Space-time tradeoffs in memory hierarchies. Technical report, Brown University, Providence, RI, USA, 1994.

Digital Library

[15]

S. Winograd. On the number of multiplications required to compute certain functions. Proceedings of the National Academy of Science, 58(5), 1967.

[16]

C.-Q. Yang and B. Miller. Critical path analysis for the execution of parallel and distributed programs. Proceedings of the 8th International Conference on Distributed Computing Systems, pages 366--373, 1988.

Cited By

Böhnlein TPapp PYzelman AAgrawal KPetrank E(2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660269
Eyraud-Dubois LIooss GLangou JRastello FAgrawal KPetrank E(2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659986
Moran YSchwartz OAgrawal KShun J(2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591083
Show More Cited By

Index Terms

Matrix Multiplication I/O-Complexity by Path Routing
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Matrix Multiplication, a Little Faster

Strassen’s algorithm (1969) was the first sub-cubic matrix multiplication algorithm. Winograd (1971) improved the leading coefficient of its complexity from 6 to 7. There have been many subsequent asymptotic improvements. Unfortunately, most of these ...
Communication-optimal parallel algorithm for strassen's matrix multiplication
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The ...
Graph expansion and communication costs of fast matrix multiplication

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication algorithms, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

June 2015

362 pages

ISBN:9781450335881

DOI:10.1145/2755573

General Chair:
Guy Blelloch
Carnegie Mellon University, USA
,
Program Chair:
Kunal Agrawal
Washington University in St. Louis, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Science and Technology Israel
Israel Science Foundation

Conference

SPAA '15

Sponsor:

SPAA '15: 27th ACM Symposium on Parallelism in Algorithms and Architectures

June 13 - 15, 2015

Oregon, Portland, USA

Acceptance Rates

SPAA '15 Paper Acceptance Rate 31 of 131 submissions, 24%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
214
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to

Other Metrics

View Author Metrics

Citations

Cited By

Böhnlein TPapp PYzelman AAgrawal KPetrank E(2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3660269
Eyraud-Dubois LIooss GLangou JRastello FAgrawal KPetrank E(2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659986
Moran YSchwartz OAgrawal KShun J(2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591083
Karstadt ESchwartz O(2020)Matrix Multiplication, a Little FasterJournal of the ACM10.1145/336450467:1(1-31)Online publication date: 15-Jan-2020
https://dl.acm.org/doi/10.1145/3364504
Jain SZaharia MScheideler CSpear M(2020)Spectral Lower Bounds on the I/O Complexity of Computation GraphsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400210(329-338)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3350755.3400210
Bilardi GDe Stefani LChan T(2019)The I/O complexity of toom-cook integer multiplicationProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310558(2034-2052)Online publication date: 6-Jan-2019
https://dl.acm.org/doi/10.5555/3310435.3310558
Kwasniewski GKabić MBesta MVandeVondele JSolcà RHoefler TTaufer MBalaji PPeña A(2019)Red-blue pebbling revisitedProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356181(1-22)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356181
Weiss ESchwartz O(2019)Computation of Matrix Chain Products on Parallel Machines2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00059(491-500)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00059
Nissim RSchwartz O(2019)Revisiting the I/O-Complexity of Fast Matrix Multiplication with Recomputations2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00058(482-490)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00058
Karstadt ESchwartz OScheideler CHajiaghayi M(2017)Matrix Multiplication, a Little FasterProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087579(101-110)Online publication date: 24-Jul-2017
https://dl.acm.org/doi/10.1145/3087556.3087579
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents