Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2388996.2389133acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Communication-avoiding parallel strassen: implementation and performance

Published: 10 November 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Matrix multiplication is a fundamental kernel of many high performance and scientific computing applications. Most parallel implementations use classical O(n3) matrix multiplication, even though there exist algorithms with lower arithmetic complexity. We recently presented a new Communication-Avoiding Parallel Strassen algorithm (CAPS), based on Strassen's fast matrix multiplication, that minimizes communication (SPAA '12). It communicates asymptotically less than all classical and all previous Strassen-based algorithms, and it attains theoretical lower bounds.
    In this paper we show that CAPS is also faster in practice. We benchmark and compare its performance to previous algorithms on Hopper (Cray XE6), Intrepid (IBM BG/P), and Franklin (Cray XT4). We demonstrate significant speedups over previous algorithms both for large matrices and for small matrices on large numbers of processors. We model and analyze the performance of CAPS and predict its performance on future exascale platforms.

    References

    [1]
    M. D. Adams and D. S. Wise. Seven at one stroke: Results from a cache-oblivious paradigm for scalable matrix algorithms. In MSPC '06: Proceedings of the 2006 Workshop on Memory System Performance and Correctness, pages 41--50, New York, NY, USA, 2006. ACM.
    [2]
    R. C. Agarwal, S. M. Balle, F. G. Gustavson, M. Joshi, and P. Palkar. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development, 39:39--5, 1995.
    [3]
    G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 77--79, New York, NY, USA, 2012. ACM.
    [4]
    G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. Communication-optimal parallel algorithm for Strassen's matrix multiplication. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, pages 193--204, New York, NY, USA, 2012. ACM.
    [5]
    G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Graph expansion and communication costs of fast matrix multiplication. In SPAA '11: Proceedings of the 23rd Annual Symposium on Parallelism in Algorithms and Architectures, pages 1--12, New York, NY, USA, 2011. ACM.
    [6]
    G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications, 32(3):866--901, 2011.
    [7]
    J. Berntsen. Communication efficient matrix multiplication on hypercubes. Parallel Computing, 12(3):335--342, 1989.
    [8]
    L. Cannon. A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University, Bozeman, MN, 1969.
    [9]
    J. Demmel, I. Dumitriu, and O. Holtz. Fast linear algebra is stable. Numerische Mathematik, 108(1):59--91, 2007.
    [10]
    M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285, Washington, DC, USA, 1999. IEEE Computer Society.
    [11]
    B. Grayson, A. Shah, and R. van de Geijn. A high performance parallel Strassen implementation. In Parallel Processing Letters, volume 6, pages 3--12, 1995.
    [12]
    N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA, 2nd edition, 2002.
    [13]
    D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.
    [14]
    Q. Luo and J. Drake. A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers. In Proceedings of the 1995 ACM Symposium on Applied Computing, SAC '95, pages 221--226, New York, NY, USA, 1995. ACM.
    [15]
    W. F. McColl and A. Tiskin. Memory-efficient matrix multiplication in the BSP model. Algorithmica, 24:287--297, 1999. 10.1007/PL00008264.
    [16]
    H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon. Top500 supercomputer sites, 2011. www.top500.org.
    [17]
    J. Shalf, S. S. Dosanjh, and J. Morrison. Exascale computing technology challenges. In J. M. L. M. Palma, M. J. Daydé, O. Marques, and J. C. Lopes, editors, High Performance Computing for Computational Science - VECPAR 2010 - 9th International conference, Berkeley, CA, USA, June 22-25, 2010, Revised Selected Papers, volume 6449 of Lecture Notes in Computer Science, pages 1--25. Springer, 2010.
    [18]
    E. Solomonik, A. Bhatele, and J. Demmel. Improving communication performance in dense linear algebra via topology aware collectives. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 77:1--77:11, New York, NY, USA, 2011. ACM.
    [19]
    E. Solomonik and J. Demmel. Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. Technical Report UCB/EECS-2011-10, EECS Department, University of California, Berkeley, Feb 2011.
    [20]
    R. A. van de Geijn and J. Watts. SUMMA: scalable universal matrix multiplication algorithm. Concurrency - Practice and Experience, 9(4):255--274, 1997.

    Cited By

    View all
    • (2019)Probabilistic tensors and opportunistic boolean matrix multiplicationProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310466(496-515)Online publication date: 6-Jan-2019
    • (2019)Faster Matrix Multiplication via Sparse DecompositionThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323188(11-22)Online publication date: 17-Jun-2019
    • (2018)Six Pass MapReduce Implementation of Strassen's Algorithm for Matrix MultiplicationProceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/3206333.3206336(1-6)Online publication date: 15-Jun-2018
    • Show More Cited By
    1. Communication-avoiding parallel strassen: implementation and performance

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
          November 2012
          1161 pages
          ISBN:9781467308045

          Sponsors

          Publisher

          IEEE Computer Society Press

          Washington, DC, United States

          Publication History

          Published: 10 November 2012

          Check for updates

          Qualifiers

          • Research-article

          Conference

          SC '12
          Sponsor:

          Acceptance Rates

          SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
          Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)3
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 10 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2019)Probabilistic tensors and opportunistic boolean matrix multiplicationProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310466(496-515)Online publication date: 6-Jan-2019
          • (2019)Faster Matrix Multiplication via Sparse DecompositionThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323188(11-22)Online publication date: 17-Jun-2019
          • (2018)Six Pass MapReduce Implementation of Strassen's Algorithm for Matrix MultiplicationProceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/3206333.3206336(1-6)Online publication date: 15-Jun-2018
          • (2017)MapReduce Implementation of Strassen's Algorithm for Matrix MultiplicationProceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/3070607.3070614(1-10)Online publication date: 14-May-2017
          • (2016)Strassen's algorithm reloadedProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014983(1-12)Online publication date: 13-Nov-2016
          • (2015)A framework for practical parallel fast matrix multiplicationACM SIGPLAN Notices10.1145/2858788.268851350:8(42-53)Online publication date: 24-Jan-2015
          • (2015)A framework for practical parallel fast matrix multiplicationProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688513(42-53)Online publication date: 24-Jan-2015
          • (2014)Faster all-pairs shortest paths via circuit complexityProceedings of the forty-sixth annual ACM symposium on Theory of computing10.1145/2591796.2591811(664-673)Online publication date: 31-May-2014
          • (2014)Communication costs of Strassen's matrix multiplicationCommunications of the ACM10.1145/2556647.255666057:2(107-114)Online publication date: 1-Feb-2014
          • (2014)A communication-optimal framework for contracting distributed tensorsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.36(375-386)Online publication date: 16-Nov-2014
          • Show More Cited By

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media