Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3558481.3591072acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Public Access

Parallel Memory-Independent Communication Bounds for SYRK

Published: 17 June 2023 Publication History

Abstract

In this paper, we focus on the parallel communication cost of multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK). SYRK requires half the computation of general matrix multiplication because of the symmetry of the output matrix. Recent work (Beaumont et al., SPAA '22) has demonstrated that the sequential I/O complexity of SYRK is also a constant factor smaller than that of general matrix multiplication. Inspired by this progress, we establish memory-independent parallel communication lower bounds for SYRK with smaller constants than general matrix multiplication, and we show that these constants are tight by presenting communication-optimal algorithms. The crux of the lower bound proof relies on extending a key geometric inequality to symmetric computations and analytically solving a constrained nonlinear optimization problem. The optimal algorithms use a triangular blocking scheme for parallel distribution of the symmetric output matrix and corresponding computation.

References

[1]
A. Aggarwal, A. K. Chandra, and M. Snir. 1990. Communication Complexity of PRAMs. Theor. Comp. Sci., Vol. 71, 1 (1990). https://doi.org/10.1016/0304-3975(90)90188-N
[2]
H. Al Daas, G. Ballard, L. Grigori, S. Kumar, and K. Rouse. 2022. Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds. In SPAA 2022. https://doi.org/10.1145/3490148.3538552
[3]
G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. 2014. Communication Lower Bounds and Optimal Algorithms for Numerical Linear Algebra. Acta Numerica, Vol. 23 (2014). https://doi.org/10.1017/S0962492914000038
[4]
G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. 2012. Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds. In SPAA 2012. https://doi.org/10.1145/2312005.2312021
[5]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. 2011. Minimizing Communication in Numerical Linear Algebra. SIAM J. Matrix Anal. Appl., Vol. 32, 3 (2011). https://doi.org/10.1137/090769156
[6]
O. Beaumont, P. Duchon, L. Eyraud-Dubois, J. Langou, and M. Vérité. 2022a. Symmetric Block-Cyclic Distribution: Fewer Communications Leads to Faster Dense Cholesky Factorization. In SC 2022. https://dl.acm.org/doi/abs/10.5555/3571885.3571923
[7]
O. Beaumont, L. Eyraud-Dubois, J. Langou, and M. Vérité. 2022b. I/O-optimal Algorithms for Symmetric Linear Algebra Kernels. In SPAA 2022. https://doi.org/10.1145/3490148.3538587
[8]
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. 1997. ScaLAPACK Users' Guide. SIAM. Also available from http://www.netlib.org/scalapack/.
[9]
S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press. https://web.stanford.edu/ boyd/cvxbook/
[10]
J. Bruck, Ching-Tien Ho, S. Kipnis, E. Upfal, and D. Weathersby. 1997. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems. IEEE Trans. on Par. and Dist. Sys., Vol. 8, 11 (1997). https://doi.org/10.1109/71.642949
[11]
E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. 2007. Collective Communication: Theory, Practice, and Experience. Conc. and Comp.: Prac. and Exper., Vol. 19, 13 (2007). https://doi.org/10.1002/cpe.1206
[12]
J. Demmel, D. Eliahu, A. Fox, S. Kamil, B. Lipshitz, O. Schwartz, and O. Spillinger. 2013. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In IPDPS 2013. https://doi.org/10.1109/IPDPS.2013.80
[13]
J. Dongarra, J.-F. Pineau, Y. Robert, Z. Shi, and F. Vivien. 2008. Revisiting Matrix Product on Master-Worker Platforms. Intl. J. Found. of Comp. Sci., Vol. 19, 06 (2008). https://doi.org/10.1142/S0129054108006303
[14]
J. W. Hong and H. T. Kung. 1981. I/O complexity: The Red-Blue Pebble Game. In STOC 1981. https://doi.org/10.1145/800076.802486
[15]
D. Irony, S. Toledo, and A. Tiskin. 2004. Communication Lower Bounds for Distributed-Memory Matrix Multiplication. J. Par. and Dist. Comp., Vol. 64, 9 (2004). https://doi.org/10.1016/j.jpdc.2004.03.021
[16]
G. Kwasniewski, M. Kabic, T. Ben-Nun, A. N. Ziogas, J. E. Saethre, A. Gaillard, T. Schneider, M. Besta, A. Kozhevnikov, J. VandeVondele, and T. Hoefler. 2021. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations. In SC 2021. https://doi.org/10.1145/3458817.3476167
[17]
L. H. Loomis and H. Whitney. 1949. An Inequality Related to the Isoperimetric Inequality. Bull. Amer. Math. Soc., Vol. 55, 10 (1949). https://doi.org/10.1090/S0002-9904-1949-09320-5
[18]
A. Olivry, J. Langou, L.-N. Pouchet, P. Sadayappan, and F. Rastello. 2020. Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs. In PLDI 2020. https://doi.org/10.1145/3385412.3385989
[19]
G. Olivry, A. Ioos, N. Tollenaere, A. Rountev, P. Sadayappan, and F. Rastello. 2021. IOOpt: Automatic Derivation of I/O Complexity Bounds for Affine Programs. In PLDI 2021. https://doi.org/10.1145/3453483
[20]
J. Poulson, B. Marker, R. A. van de Geijn, J. R. Hammond, and N. A. Romero. 2013. Elemental: A New Framework for Distributed Memory Dense Matrix Computations. ACM Trans. on Math. Soft., Vol. 39, 2, Article 13 (2013). https://doi.org/10.1145/2427023.2427030
[21]
T. M. Smith, B. Lowery, J. Langou, and R. A. van de Geijn. 2019. A Tight I/O Lower Bound for Matrix Multiplication. Technical Report. arXiv. https://doi.org/10.48550/arXiv.1702.02017
[22]
R. Thakur, R. Rabenseifner, and W. Gropp. 2005. Optimization of Collective Communication Operations in MPICH. Intl. J. High Perf. Comp. App., Vol. 19, 1 (2005). https://doi.org/10.1177/1094342005051521

Index Terms

  1. Parallel Memory-Independent Communication Bounds for SYRK

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures
    June 2023
    504 pages
    ISBN:9781450395458
    DOI:10.1145/3558481
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. communication costs
    2. convex optimization
    3. symmetric matrices

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SPAA '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Upcoming Conference

    SPAA '25
    37th ACM Symposium on Parallelism in Algorithms and Architectures
    July 28 - August 1, 2025
    Portland , OR , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 106
      Total Downloads
    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media