Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2312005.2312021acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
abstract

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds

Published: 25 June 2012 Publication History

Abstract

A parallel algorithm has perfect strong scaling if its running time on $P$ processors is linear in $1/P$, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms.

References

[1]
Aggarwal, A., Chandra, A. K., and Snir, M. Communication complexity of PRAMs. Theoretical Computer Science 71, 1 (1990), 3 -- 28.
[2]
Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., and Schwartz, O. Communication-optimal parallel algorithm for Strassen's matrix multiplication. In SPAA '12: Proceedings of the 24th Annual Symposium on Parallelism in Algorithms and Architectures (New York, NY, USA, 2012), ACM.
[3]
Ballard, G., Demmel, J., Holtz, O., Lipshitz, B., and Schwartz, O. Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. EECS Technical Report EECS-2012--31, UC Berkeley, Mar. 2012.
[4]
Ballard, G., Demmel, J., Holtz, O., and Schwartz, O. Graph expansion and communication costs of fast matrix multiplication. In SPAA '11: Proceedings of the 23rd Annual Symposium on Parallelism in Algorithms and Architectures (New York, NY, USA, 2011), ACM, pp. 1--12.
[5]
Ballard, G., Demmel, J., Holtz, O., and Schwartz, O. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications 32, 3 (2011), 866--901.
[6]
Irony, D., Toledo, S., and Tiskin, A. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64, 9 (2004), 1017--1026.
[7]
Loomis, L. H., and Whitney, H. An inequality related to the isoperimetric inequality. Bulletin of the AMS 55 (1949), 961--962.
[8]
Solomonik, E., and Demmel, J. Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In Euro-Par '11: Proceedings of the 17th International European Conference on Parallel and Distributed Computing (2011), Springer.

Cited By

View all
  • (2023)Parallel Memory-Independent Communication Bounds for SYRKProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591072(391-401)Online publication date: 17-Jun-2023
  • (2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
  • (2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
  • Show More Cited By

Index Terms

  1. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
    June 2012
    348 pages
    ISBN:9781450312134
    DOI:10.1145/2312005

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 June 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. communication-avoiding algorithms
    2. fast matrix multiplication
    3. strong scaling

    Qualifiers

    • Abstract

    Conference

    SPAA '12

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Upcoming Conference

    SPAA '25
    37th ACM Symposium on Parallelism in Algorithms and Architectures
    July 28 - August 1, 2025
    Portland , OR , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Parallel Memory-Independent Communication Bounds for SYRKProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591072(391-401)Online publication date: 17-Jun-2023
    • (2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
    • (2020)Improving the Space-Time Efficiency of Matrix Multiplication AlgorithmsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409404(1-10)Online publication date: 17-Aug-2020
    • (2020)Balanced Partitioning of Several Cache-Oblivious AlgorithmsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400214(575-577)Online publication date: 6-Jul-2020
    • (2020)Cartesian Partitioning Models for 2D and 3D Parallel SpGEMM AlgorithmsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300070831:12(2763-2775)Online publication date: 1-Dec-2020
    • (2019)Trade-offs between computation, communication, and synchronization in stencil-collective alternate updateCCF Transactions on High Performance Computing10.1007/s42514-019-00011-x1:2(144-160)Online publication date: 26-Jul-2019
    • (2018)Six Pass MapReduce Implementation of Strassen's Algorithm for Matrix MultiplicationProceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/3206333.3206336(1-6)Online publication date: 15-Jun-2018
    • (2018)Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00065(557-567)Online publication date: May-2018
    • (2018)A parallel algorithm for matrix fast exponentiation based on MPI2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA)10.1109/ICBDA.2018.8367669(162-165)Online publication date: Mar-2018
    • (2017)MapReduce Implementation of Strassen's Algorithm for Matrix MultiplicationProceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/3070607.3070614(1-10)Online publication date: 14-May-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media