Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2755573.2755594acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Matrix Multiplication I/O-Complexity by Path Routing

Published: 13 June 2015 Publication History

Abstract

We apply a novel technique based on path routings to obtain optimal I/O-complexity lower bounds for all Strassen-like fast matrix multiplication algorithms computed in serial or in parallel, assuming no reuse of nontrivial intermediate linear combinations. Given fast memory of size M, we prove an I/O-complexity lower bound of Ω((n/√M}ω0M) for any Strassen-like matrix multiplication algorithm applied to n x n matrices of arithmetic complexity Θ(nω0) with ω0<3 under this assumption. This generalizes an approach by Ballard, Demmel, Holtz, and Schwartz that provides a tight lower bound for Strassen's matrix multiplication algorithm but which does not apply to algorithms with disconnected encoding or decoding components of the underlying computation graph or algorithms with multiply copied values. We overcome these challenges via a new graph-theoretical approach for proving I/O-complexity lower bounds without the use of edge expansions.

References

[1]
G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica, 23:1--155, 5 2014.
[2]
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proc. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), SPAA '12, pages 77--79, New York, NY, USA, 2012. ACM.
[3]
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, 2012.
[4]
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. Design and Analysis of Algorithms, 7659:13--36, 2012.
[5]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. & Appl., 32(3):866--901, 2011.
[6]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. Journal of the ACM, 59(6), 2012.
[7]
G. Bilardi, A. Pietracaprina, and P. D'Alberto. On the space and access complexity of computation dags. Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science, London, UK, pages 47--58, 2000.
[8]
D. Bini, M. Capovani, F. Romani, and G. Lotti. o(n2.7799) complexity for $ntimes n$ approximate matrix multiplication. Information processing letters, 8(5):234--235, 1979.
[9]
M. Christ, J. Demmel, N. Knight, T. Scanlon, and K. A. Yelick. Communication lower bounds and optimal algorithms for programs that reference arrays - part 1. Technical Report UCB/EECS-2013--61, EECS Department, University of California, Berkeley, May 2013.
[10]
J. W. Hong and H. T. Kung. The red-blue pebble game. STOC 1981: Proceedings of the thirteenth annual ACM symposium on theory of computing, pages 326--333, 1981.
[11]
J. Hopcroft and L. Kerr. On minimizing the number of multiplications necessary for matrix multiplication. SIAM Journal on Applied Mathematics, 20(1):30--36, 1971.
[12]
D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.
[13]
L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society, 55(10), 1949.
[14]
J. Savage. Space-time tradeoffs in memory hierarchies. Technical report, Brown University, Providence, RI, USA, 1994.
[15]
S. Winograd. On the number of multiplications required to compute certain functions. Proceedings of the National Academy of Science, 58(5), 1967.
[16]
C.-Q. Yang and B. Miller. Critical path analysis for the execution of parallel and distributed programs. Proceedings of the 8th International Conference on Distributed Computing Systems, pages 366--373, 1988.

Cited By

View all
  • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
  • (2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
  • (2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
  • Show More Cited By

Index Terms

  1. Matrix Multiplication I/O-Complexity by Path Routing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures
    June 2015
    362 pages
    ISBN:9781450335881
    DOI:10.1145/2755573
    • General Chair:
    • Guy Blelloch,
    • Program Chair:
    • Kunal Agrawal
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. communication-avoiding algorithms
    2. fast matrix multiplication
    3. i/o-complexity

    Qualifiers

    • Research-article

    Funding Sources

    • Ministry of Science and Technology Israel
    • Israel Science Foundation

    Conference

    SPAA '15

    Acceptance Rates

    SPAA '15 Paper Acceptance Rate 31 of 131 submissions, 24%;
    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
    • (2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
    • (2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
    • (2020)Matrix Multiplication, a Little FasterJournal of the ACM10.1145/336450467:1(1-31)Online publication date: 15-Jan-2020
    • (2020)Spectral Lower Bounds on the I/O Complexity of Computation GraphsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400210(329-338)Online publication date: 6-Jul-2020
    • (2019)The I/O complexity of toom-cook integer multiplicationProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310558(2034-2052)Online publication date: 6-Jan-2019
    • (2019)Red-blue pebbling revisitedProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356181(1-22)Online publication date: 17-Nov-2019
    • (2019)Computation of Matrix Chain Products on Parallel Machines2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00059(491-500)Online publication date: May-2019
    • (2019)Revisiting the I/O-Complexity of Fast Matrix Multiplication with Recomputations2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00058(482-490)Online publication date: May-2019
    • (2017)Matrix Multiplication, a Little FasterProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087579(101-110)Online publication date: 24-Jul-2017
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media