Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2755573.2755594acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Matrix Multiplication I/O-Complexity by Path Routing

Published: 13 June 2015 Publication History
  • Get Citation Alerts
  • Abstract

    We apply a novel technique based on path routings to obtain optimal I/O-complexity lower bounds for all Strassen-like fast matrix multiplication algorithms computed in serial or in parallel, assuming no reuse of nontrivial intermediate linear combinations. Given fast memory of size M, we prove an I/O-complexity lower bound of Ω((n/√M}ω0M) for any Strassen-like matrix multiplication algorithm applied to n x n matrices of arithmetic complexity Θ(nω0) with ω0<3 under this assumption. This generalizes an approach by Ballard, Demmel, Holtz, and Schwartz that provides a tight lower bound for Strassen's matrix multiplication algorithm but which does not apply to algorithms with disconnected encoding or decoding components of the underlying computation graph or algorithms with multiply copied values. We overcome these challenges via a new graph-theoretical approach for proving I/O-complexity lower bounds without the use of edge expansions.

    References

    [1]
    G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica, 23:1--155, 5 2014.
    [2]
    G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proc. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), SPAA '12, pages 77--79, New York, NY, USA, 2012. ACM.
    [3]
    G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, 2012.
    [4]
    G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Graph expansion analysis for communication costs of fast rectangular matrix multiplication. Design and Analysis of Algorithms, 7659:13--36, 2012.
    [5]
    G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. & Appl., 32(3):866--901, 2011.
    [6]
    G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. Journal of the ACM, 59(6), 2012.
    [7]
    G. Bilardi, A. Pietracaprina, and P. D'Alberto. On the space and access complexity of computation dags. Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science, London, UK, pages 47--58, 2000.
    [8]
    D. Bini, M. Capovani, F. Romani, and G. Lotti. o(n2.7799) complexity for $ntimes n$ approximate matrix multiplication. Information processing letters, 8(5):234--235, 1979.
    [9]
    M. Christ, J. Demmel, N. Knight, T. Scanlon, and K. A. Yelick. Communication lower bounds and optimal algorithms for programs that reference arrays - part 1. Technical Report UCB/EECS-2013--61, EECS Department, University of California, Berkeley, May 2013.
    [10]
    J. W. Hong and H. T. Kung. The red-blue pebble game. STOC 1981: Proceedings of the thirteenth annual ACM symposium on theory of computing, pages 326--333, 1981.
    [11]
    J. Hopcroft and L. Kerr. On minimizing the number of multiplications necessary for matrix multiplication. SIAM Journal on Applied Mathematics, 20(1):30--36, 1971.
    [12]
    D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.
    [13]
    L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the American Mathematical Society, 55(10), 1949.
    [14]
    J. Savage. Space-time tradeoffs in memory hierarchies. Technical report, Brown University, Providence, RI, USA, 1994.
    [15]
    S. Winograd. On the number of multiplications required to compute certain functions. Proceedings of the National Academy of Science, 58(5), 1967.
    [16]
    C.-Q. Yang and B. Miller. Critical path analysis for the execution of parallel and distributed programs. Proceedings of the 8th International Conference on Distributed Computing Systems, pages 366--373, 1988.

    Cited By

    View all
    • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
    • (2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
    • (2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
    • Show More Cited By

    Index Terms

    1. Matrix Multiplication I/O-Complexity by Path Routing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures
      June 2015
      362 pages
      ISBN:9781450335881
      DOI:10.1145/2755573
      • General Chair:
      • Guy Blelloch,
      • Program Chair:
      • Kunal Agrawal
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 June 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. communication-avoiding algorithms
      2. fast matrix multiplication
      3. i/o-complexity

      Qualifiers

      • Research-article

      Funding Sources

      • Ministry of Science and Technology Israel
      • Israel Science Foundation

      Conference

      SPAA '15

      Acceptance Rates

      SPAA '15 Paper Acceptance Rate 31 of 131 submissions, 24%;
      Overall Acceptance Rate 447 of 1,461 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)2
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660269(285-287)Online publication date: 17-Jun-2024
      • (2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
      • (2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
      • (2020)Matrix Multiplication, a Little FasterJournal of the ACM10.1145/336450467:1(1-31)Online publication date: 15-Jan-2020
      • (2020)Spectral Lower Bounds on the I/O Complexity of Computation GraphsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400210(329-338)Online publication date: 6-Jul-2020
      • (2019)The I/O complexity of toom-cook integer multiplicationProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310558(2034-2052)Online publication date: 6-Jan-2019
      • (2019)Red-blue pebbling revisitedProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356181(1-22)Online publication date: 17-Nov-2019
      • (2019)Computation of Matrix Chain Products on Parallel Machines2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00059(491-500)Online publication date: May-2019
      • (2019)Revisiting the I/O-Complexity of Fast Matrix Multiplication with Recomputations2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00058(482-490)Online publication date: May-2019
      • (2017)Matrix Multiplication, a Little FasterProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087579(101-110)Online publication date: 24-Jul-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media