research-article

Public Access

Parallel Memory-Independent Communication Bounds for SYRK

Authors:

Hussam Al Daas,

Kathryn RouseAuthors Info & Claims

SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures

Pages 391 - 401

https://doi.org/10.1145/3558481.3591072

Published: 17 June 2023 Publication History

Abstract

In this paper, we focus on the parallel communication cost of multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK). SYRK requires half the computation of general matrix multiplication because of the symmetry of the output matrix. Recent work (Beaumont et al., SPAA '22) has demonstrated that the sequential I/O complexity of SYRK is also a constant factor smaller than that of general matrix multiplication. Inspired by this progress, we establish memory-independent parallel communication lower bounds for SYRK with smaller constants than general matrix multiplication, and we show that these constants are tight by presenting communication-optimal algorithms. The crux of the lower bound proof relies on extending a key geometric inequality to symmetric computations and analytically solving a constrained nonlinear optimization problem. The optimal algorithms use a triangular blocking scheme for parallel distribution of the symmetric output matrix and corresponding computation.

References

[1]

A. Aggarwal, A. K. Chandra, and M. Snir. 1990. Communication Complexity of PRAMs. Theor. Comp. Sci., Vol. 71, 1 (1990). https://doi.org/10.1016/0304-3975(90)90188-N

Digital Library

[2]

H. Al Daas, G. Ballard, L. Grigori, S. Kumar, and K. Rouse. 2022. Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds. In SPAA 2022. https://doi.org/10.1145/3490148.3538552

Digital Library

[3]

G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. 2014. Communication Lower Bounds and Optimal Algorithms for Numerical Linear Algebra. Acta Numerica, Vol. 23 (2014). https://doi.org/10.1017/S0962492914000038

[4]

G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. 2012. Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds. In SPAA 2012. https://doi.org/10.1145/2312005.2312021

Digital Library

[5]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. 2011. Minimizing Communication in Numerical Linear Algebra. SIAM J. Matrix Anal. Appl., Vol. 32, 3 (2011). https://doi.org/10.1137/090769156

[6]

O. Beaumont, P. Duchon, L. Eyraud-Dubois, J. Langou, and M. Vérité. 2022a. Symmetric Block-Cyclic Distribution: Fewer Communications Leads to Faster Dense Cholesky Factorization. In SC 2022. https://dl.acm.org/doi/abs/10.5555/3571885.3571923

[7]

O. Beaumont, L. Eyraud-Dubois, J. Langou, and M. Vérité. 2022b. I/O-optimal Algorithms for Symmetric Linear Algebra Kernels. In SPAA 2022. https://doi.org/10.1145/3490148.3538587

Digital Library

[8]

L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. 1997. ScaLAPACK Users' Guide. SIAM. Also available from http://www.netlib.org/scalapack/.

[9]

S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press. https://web.stanford.edu/ boyd/cvxbook/

[10]

J. Bruck, Ching-Tien Ho, S. Kipnis, E. Upfal, and D. Weathersby. 1997. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems. IEEE Trans. on Par. and Dist. Sys., Vol. 8, 11 (1997). https://doi.org/10.1109/71.642949

Digital Library

[11]

E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. 2007. Collective Communication: Theory, Practice, and Experience. Conc. and Comp.: Prac. and Exper., Vol. 19, 13 (2007). https://doi.org/10.1002/cpe.1206

[12]

J. Demmel, D. Eliahu, A. Fox, S. Kamil, B. Lipshitz, O. Schwartz, and O. Spillinger. 2013. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In IPDPS 2013. https://doi.org/10.1109/IPDPS.2013.80

Digital Library

[13]

J. Dongarra, J.-F. Pineau, Y. Robert, Z. Shi, and F. Vivien. 2008. Revisiting Matrix Product on Master-Worker Platforms. Intl. J. Found. of Comp. Sci., Vol. 19, 06 (2008). https://doi.org/10.1142/S0129054108006303

[14]

J. W. Hong and H. T. Kung. 1981. I/O complexity: The Red-Blue Pebble Game. In STOC 1981. https://doi.org/10.1145/800076.802486

Digital Library

[15]

D. Irony, S. Toledo, and A. Tiskin. 2004. Communication Lower Bounds for Distributed-Memory Matrix Multiplication. J. Par. and Dist. Comp., Vol. 64, 9 (2004). https://doi.org/10.1016/j.jpdc.2004.03.021

Digital Library

[16]

G. Kwasniewski, M. Kabic, T. Ben-Nun, A. N. Ziogas, J. E. Saethre, A. Gaillard, T. Schneider, M. Besta, A. Kozhevnikov, J. VandeVondele, and T. Hoefler. 2021. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations. In SC 2021. https://doi.org/10.1145/3458817.3476167

Digital Library

[17]

L. H. Loomis and H. Whitney. 1949. An Inequality Related to the Isoperimetric Inequality. Bull. Amer. Math. Soc., Vol. 55, 10 (1949). https://doi.org/10.1090/S0002-9904-1949-09320-5

[18]

A. Olivry, J. Langou, L.-N. Pouchet, P. Sadayappan, and F. Rastello. 2020. Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs. In PLDI 2020. https://doi.org/10.1145/3385412.3385989

Digital Library

[19]

G. Olivry, A. Ioos, N. Tollenaere, A. Rountev, P. Sadayappan, and F. Rastello. 2021. IOOpt: Automatic Derivation of I/O Complexity Bounds for Affine Programs. In PLDI 2021. https://doi.org/10.1145/3453483

Digital Library

[20]

J. Poulson, B. Marker, R. A. van de Geijn, J. R. Hammond, and N. A. Romero. 2013. Elemental: A New Framework for Distributed Memory Dense Matrix Computations. ACM Trans. on Math. Soft., Vol. 39, 2, Article 13 (2013). https://doi.org/10.1145/2427023.2427030

Digital Library

[21]

T. M. Smith, B. Lowery, J. Langou, and R. A. van de Geijn. 2019. A Tight I/O Lower Bound for Matrix Multiplication. Technical Report. arXiv. https://doi.org/10.48550/arXiv.1702.02017

[22]

R. Thakur, R. Rabenseifner, and W. Gropp. 2005. Optimization of Collective Communication Operations in MPICH. Intl. J. High Perf. Comp. App., Vol. 19, 1 (2005). https://doi.org/10.1177/1094342005051521

Digital Library

Index Terms

Parallel Memory-Independent Communication Bounds for SYRK
1. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms

Recommendations

Scaling Matrices to Prescribed Row and Column Maxima

A nonnegative symmetric matrix $B$ has row maxima prescribed by a given vector $r$, if for each index $i$, the maximum entry in the $i$th row of $B$ equals $r_i$. This paper presents necessary and sufficient conditions so that for a given nonnegative ...
Parallel QR algorithm for the complete eigensystem of symmetric matrices
PDP '95: Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing

We propose a parallel organization of the QR algorithm for computing the complete eigensystem of symmetric matrices. We developed Occam versions of standard sequential implementations of the QR algorithm: the procedure qr1 which computes only ...
Is Jacobi--Davidson Faster than Davidson?

The Davidson method is a popular technique to compute a few of the smallest (or largest) eigenvalues of a large sparse real symmetric matrix. It is effective when the matrix is nearly diagonal, that is, when the matrix of eigenvectors is close to the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures

June 2023

504 pages

ISBN:9781450395458

DOI:10.1145/3558481

General Chair:
Kunal Agrawal
Washington University in St. Louis, USA
,
Program Chair:
Julian Shun
MIT, USA

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SPAA '23

Sponsor:

SPAA '23: 35th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 19, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
106
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)7

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten