Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1989493.1989495acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Graph expansion and communication costs of fast matrix multiplication: regular submission

Published: 04 June 2011 Publication History

Abstract

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication algorithms, and obtain the first lower bounds on their communication costs. For sequential algorithms these bounds are attainable and so optimal.

References

[1]
N. Ahmed and K. Pingali. Automatic generation of block-recursive codes. In Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, pages 368--378, London, UK, 2000. Springer-Verlag.
[2]
N. Alon, O. Schwartz, and A. Shapira. An elementary construction of constant-degree expanders. Combinatorics, Probability & Computing, 17(3):319--327, 2008.
[3]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, 1988.
[4]
D. H. Bailey. Extra-high speed matrix multiplication on the Cray-2. SIAM J. Sci. Stat. Comput, 9:603--607, 1988.
[5]
M. A. Bender, G. S. Brodal, R. Fagerberg, R. Jacob, and E. Vicari. Optimal sparse matrix dense vector multiplication in the I/O-model. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 61--70, New York, NY, USA, 2007. ACM.
[6]
G. E. Blelloch, R. A. Chowdhury, P. B. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In SODA '08: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 501--510, Philadelphia, PA, USA, 2008. Society for Industrial and Applied Mathematics.
[7]
P. Burgisser, M. Clausen, and M. A. Shokrollahi. Algebraic Complexity Theory. Number 315 in Grundlehren der mathematischen Wissenschaften. Springer Verlag, 1997.
[8]
G. Ballard, J. Demmel, and A. Gearhart. Communication bounds for heterogeneous architectures. In 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2011), 2011. (to appear as a "brief announcement").
[9]
G. Ballard, J. Demmel, O. Holtz, E. Rom, and O. Schwartz. Communication-Minimizing Parallel Implementation for Strassen's Algorithm. Unpublished, 2011.
[10]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. SIAM Journal on Scientific Computing, 32(6):3495--3523, December 2010.
[11]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in linear algebra. Submitted. Available from http://arxiv.org/abs/0905.2485, 2010.
[12]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing Communication in Fast Linear Algebra. Unpublished, 2011.
[13]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Revisiting Coppersmith's "Rectangular matrix multiplication revisited" for I/O-Complexity. Unpublished, 2011.
[14]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. The Communication Costs of Hybrid Algorithms for Fast Matrix Multiplication. Unpublished, 2011.
[15]
D. Bini. Relations between exact and approximate bilinear algorithms. applications. Calcolo, 17:87--97, 1980. 10.1007/BF02575865.
[16]
G. Bilardi and F. Preparata. Processor-time tradeoffs under bounded-speed message propagation: Part II, lower boundes. Theory of Computing Systems, 32(5):1432--4350, 1999.
[17]
G. Bilardi, A. Pietracaprina, and P. D'Alberto. On the space and access complexity of computation DAGs. In WG '00: Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science, pages 47--58, London, UK, 2000. Springer-Verlag.
[18]
Y. D. Burago and V. A. Zalgaller. Geometric Inequalities, volume 285 of Grundlehren der Mathematische Wissenschaften. Springer, Berlin, 1988.
[19]
L. Cannon. A cellular computer to implement the Kalman filter algorithm. PhD thesis, Montana State University, Bozeman, MN, 1969.
[20]
H. Cohn, R. D. Kleinberg, B. Szegedy, and C. Umans. Group-theoretic algorithms for matrix multiplication. In FOCS, pages 379--388, 2005.
[21]
D. Coppersmith. Rectangular matrix multiplication revisited. J. Complex., 13:42--49, March 1997.
[22]
R. A. Chowdhury and V. Ramachandran. Cache-oblivious dynamic programming. In SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pages 591--600, New York, NY, USA, 2006. ACM.
[23]
D. Coppersmith and S. Winograd. On the asymptotic complexity of matrix multiplication. SIAM Journal on Computing, 11(3):472--492, 1982.
[24]
D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, STOC '87, pages 1--6, New York, NY, USA, 1987. ACM.
[25]
D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. J. Symb. Comput., 9(3):251--280, 1990.
[26]
C. C. Douglas, M. Heroux, G. Slishman, and R. M. Smith. GEMMW: A portable level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm. Journal of Computational Physics, 110(1):1--10, 1994.
[27]
F. Desprez and F. Suter. Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms: Research articles. Concurrency and Computation: Practice and Experience, 16(8):771--797, 2004.
[28]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 285, Washington, DC, USA, 1999. IEEE Computer Society.
[29]
S. L. Graham, M. Snir, and C. A. Patterson, editors. Getting up to Speed: The Future of Supercomputing. Report of National Research Council of the National Academies Sciences. The National Academies Press, Washington, D.C., 2004. 289 pages, http://www.nap.edu.
[30]
J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proceedings of the thirteenth annual ACM symposium on Theory of computing, pages 326--333, New York, NY, USA, 1981. ACM.
[31]
S. Huss-Lederman, E. M. Jacobson, J. R. Johnson, A. Tsao, and T. Turnbull. Implementation of Strassen's algorithm for matrix multiplication. In Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), page 32, Washington, DC, USA, 1996. IEEE Computer Society.
[32]
D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.
[33]
M. Koucky, V. Kabanets, and A. Kolokolova. Expanders made elementary, 2010. In preparation, Available from http://www.cs.sfu.ca/~kabanets/papers/expanders.pdf.
[34]
C. E. Leiserson. Personal communication with G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, 2008.
[35]
G. Lev and L. G. Valiant. Size bounds for superconcentrators. Theoretical Computer Science, 22(3):233--251, 1983.
[36]
L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bulletin of the AMS, 55:961--962, 1949.
[37]
M. Mihail. Conductance and convergence of Markov chains: A combinatorial treatment of expanders. In Proceedings of the Thirtieth Annual IEEE Symposium on Foundations of Computer Science, pages 526--531, 1989.
[38]
J. P. Michael, M. Penner, and V. K. Prasanna. Optimizing graph algorithms for improved cache performance. In Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS 2002), Fort Lauderdale, FL, pages 769--782, 2002.
[39]
V. Y. Pan. New fast algorithms for matrix operations. SIAM Journal on Computing, 9(2):321--342, 1980.
[40]
R. Raz. On the complexity of matrix product. SIAM J. Comput., 32(5):1356--1369 (electronic), 2003.
[41]
F. Romani. Some properties of disjoint sums of tensors related to matrix multiplication. SIAM Journal on Computing, 11(2):263--267, 1982.
[42]
O. Reingold, S. Vadhan, and A. Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders. Annals of Mathematics, 155(1):157--187, 2002.
[43]
J. Savage. Space-time tradeoffs in memory hierarchies. Technical report, Brown University, Providence, RI, USA, 1994.
[44]
A. Schönhage. Partial and total matrix multiplication. SIAM Journal on Computing, 10(3):434--455, 1981.
[45]
V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354--356, 1969.
[46]
V. Strassen. Relative bilinear complexity and matrix multiplication. Journal fur die reine und angewandte Mathematik (Crelles Journal), 1987(375-376):406--443, 1987.
[47]
S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl., 18(4):1065--1081, 1997.
[48]
V. Volkov and J. Demmel. Benchmarking GPUs to tune dense linear algebra. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.
[49]
S. Winograd. On the multiplication of 2 x 2 matrices. Linear Algebra Appl., 4(4):381--388., October 1971.
[50]
C.-Q. Yang and B.P. Miller. Critical path analysis for the execution of parallel and distributed programs. In Proceedings of the 8th International Conference on Distributed Computing Systems, pages 366--373, Jun. 1988.

Cited By

View all
  • (2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
  • (2021)3-D Partitioning for Large-Scale Graph ProcessingIEEE Transactions on Computers10.1109/TC.2020.298673670:1(111-127)Online publication date: 1-Jan-2021
  • (2020)MemFlow: Memory-Driven Data Scheduling With Datapath Co-Design in Accelerators for Large-Scale Inference ApplicationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.292537739:9(1875-1888)Online publication date: Sep-2020
  • Show More Cited By

Index Terms

  1. Graph expansion and communication costs of fast matrix multiplication: regular submission

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
    June 2011
    404 pages
    ISBN:9781450307437
    DOI:10.1145/1989493
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • EATCS: European Association for Theoretical Computer Science

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. communication avoiding algorithms
    2. fast matrix multiplication
    3. i/o-complexity

    Qualifiers

    • Research-article

    Conference

    SPAA '11

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Multiplying 2 × 2 Sub-Blocks Using 4 MultiplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591083(379-390)Online publication date: 17-Jun-2023
    • (2021)3-D Partitioning for Large-Scale Graph ProcessingIEEE Transactions on Computers10.1109/TC.2020.298673670:1(111-127)Online publication date: 1-Jan-2021
    • (2020)MemFlow: Memory-Driven Data Scheduling With Datapath Co-Design in Accelerators for Large-Scale Inference ApplicationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.292537739:9(1875-1888)Online publication date: Sep-2020
    • (2018) Holistic Approaches to HPC Power and Workflow Management * 2018 Ninth International Green and Sustainable Computing Conference (IGSC)10.1109/IGCC.2018.8752150(1-8)Online publication date: Oct-2018
    • (2017)Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear AlgebraComputational Mathematics, Numerical Analysis and Applications10.1007/978-3-319-49631-3_4(153-185)Online publication date: 5-Aug-2017
    • (2016)Exploring the hidden dimension in graph processingProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026900(285-300)Online publication date: 2-Nov-2016
    • (2015)Communication Avoiding AlgorithmsProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.41(150-162)Online publication date: 18-Oct-2015
    • (2014)Communication costs of Strassen's matrix multiplicationCommunications of the ACM10.1145/2556647.255666057:2(107-114)Online publication date: 1-Feb-2014
    • (2013)Beyond reuse distance analysisACM Transactions on Architecture and Code Optimization10.1145/2541228.255530910:4(1-29)Online publication date: 1-Dec-2013
    • (2013)Graph expansion and communication costs of fast matrix multiplicationJournal of the ACM10.1145/2395116.239512159:6(1-23)Online publication date: 9-Jan-2013
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media