research-article

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

Author:

Keqin LiAuthors Info & Claims

Journal of Parallel and Distributed Computing, Volume 61, Issue 12

Pages 1709 - 1731

https://doi.org/10.1006/jpdc.2001.1768

Published: 01 December 2001 Publication History

Abstract

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N ), where 2< 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(logN) time by using N /logN processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 p N /logN, multiplying two N N matrices can be performed by a DMPC with p processors in O(N /p) time, i.e., linear speedup and cost optimality can be achieved in the range 1.N /logN]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non- standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1 p N /logN, multiplying two N N matrices can be performed by p processors connected by a hypercubic network in O(N /p+(N2/p2/ )(logp)2( 1)/ ) time, which implies that if p=O(N /(logN)2( 1)/( 2)), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.

References

[1]

S.G. Akl, Prentice-Hall, Upper Saddle River, 1997.

[2]

R.J. Anderson, G.L. Miller, University of Southern CaliforniaComputer Science Dept, 1988.

[3]

A.F. Benner, H.F. Jordan, V.P. Heuring, Digital optical computing with optically switched directional couplers, Opt. Eng., 30 (1991) 1936-1941.

[4]

D. Bini, V. Pan, Birkhäuser, Boston, 1994.

[5]

A.K. Chandra, Report (October 1979).

[6]

D. Chiarulli, R. Melhem, S. Levitan, Using coincident optical pulses for parallel memory addressing, IEEE Comput., 30 (1987) 48-57.

Digital Library

[7]

D. Coppersmith, S. Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Comput., 9 (1990) 251-280.

Digital Library

[8]

E. Dekel, D. Nassimi, S. Sahni, Parallel matrix and graph algorithms, SIAM J. Comput., 10 (1981) 657-673.

[9]

P.W. Dowd, Wavelength division multiple access channel hypercube processor interconnection, IEEE Trans. Comput., 41 (1992) 1223-1241.

Digital Library

[10]

M.M. Eshaghian, Parallel algorithms for image processing on OMC, IEEE Trans. Comput., 40 (1993) 827-833.

Digital Library

[11]

M. Geréb-Graus, T. Tsantilas, Efficient optical communication in parallel computers, 1992.

[12]

L.A. Goldberg, M. Jerrum, T. Leighton, S. Rao, Doubly logarithmic communication algorithms for optical-communication parallel computers, SIAM J. Comput., 26 (1997) 1100-1119.

Digital Library

[13]

Z. Guo, R. Melhem, R. Hall, D. Chiarulli, S. Levitan, Pipelined communications in optically interconnected arrays, J. Parallel Distrib. Comput., 12 (1991) 269-282.

Digital Library

[14]

M. Hamdi, Y. Pan, Efficient parallel algorithms on optically interconnected arrays of processors, IEEE Proc. Comput. Digital Tech., 142 (1995) 87-92.

[15]

F.T. Leighton, Morgan Kaufmann, San Mateo, 1992.

[16]

S. Levitan, D. Chiarulli, R. Melhem, Coincident pulse techniques for multiprocessor interconnection structures, Appl. Opt., 29 (1990) 2024-2039.

[17]

K. Li, Constant time boolean matrix multiplication on a linear array with a reconfigurable pipelined bus system, J. Supercomputing, 11 (1997) 391-403.

[18]

K. Li, Fast and scalable parallel algorithms for matrix chain product and matrix powers on optical buses, in: High Performance Computing Systems and Applications, Kluwer Academic, Boston, 2000, pp. 333-348.

[19]

K. Li, V.Y. Pan, Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system, IEEE Trans. Comput., 50 (2001) 519-525.

Digital Library

[20]

K. Li, Y. Pan, M. Hamdi, Solving graph theory problems using reconfigurable pipelined optical buses, Parallel Comput., 26 (May 2000) 723-735.

Digital Library

[21]

K. Li, Y. Pan, S.Q. Zheng, Kluwer Academic, Boston, 1998.

[22]

K. Li, Y. Pan, S.Q. Zheng, Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system, IEEE Trans. Parallel Distrib. Systems, 9 (1998) 705-720.

Digital Library

[23]

K. Li, Y. Pan, S.Q. Zheng, Parallel matrix computations using a reconfigurable pipelined optical bus, J. Parallel Distrib. Comput., 59 (1999) 13-30.

Digital Library

[24]

K. Li, Y. Pan, S.Q. Zheng, Efficient deterministic and probabilistic simulations of PRAMs on linear arrays with reconfigurable pipelined bus systems, J. Supercomputing, 15 (2000) 163-181.

[25]

K. Li, Y. Pan, S.Q. Zheng, Scalable parallel matrix multiplication using reconfigurable pipelined optical bus systems, October 1998.

[26]

Y. Li, Y. Pan, S.Q. Zheng, Pipelined TDM optical bus with conditional delays, Opt. Eng., 36 (1997) 2417-2424.

[27]

K. Mehlhorn, U. Vishkin, Randomized and deterministic simulations of PRAMS by parallel machines with restricted granularity of parallel memories, Acta Inform., 21 (1984) 339-374.

Digital Library

[28]

Y. Pan, M. Hamdi, Efficient computation of singular value decomposition on arrays with pipelined optical buses, J. Network Comput. Appl., 19 (1996) 235-248.

Digital Library

[29]

Y. Pan, M. Hamdi, K. Li, Efficient and scalable quicksort on a linear array with a reconfigurable pipelined bus system, Future Generation Comput. Systems, 13 (1998) 501-513.

Digital Library

[30]

Y. Pan, K. Li, Linear array with a reconfigurable pipelined bus system¿Concepts and applications, J. Inform. Sci., 106 (1998) 237-258.

Digital Library

[31]

Y. Pan, K. Li, S.Q. Zheng, Fast nearest neighbor algorithms on a linear array with a reconfigurable pipelined bus system, J. Parallel Algorithms Appl., 13 (1998) 1-25.

[32]

V. Pan, New fast algorithms for matrix operations, SIAM J. Comput., 9 (1980) 321-342.

Digital Library

[33]

V. Pan, Complexity of parallel matrix computations, Theoret. Comput. Sci., 54 (1987) 65-85.

Digital Library

[34]

V. Pan, Parallel solution of sparse linear and path systems, in: Synthesis of Parallel Algorithms, Morgan Kaufmann, San Mateo, 1993, pp. 621-678.

[35]

V. Pan, J. Reif, Efficient parallel solution of linear systems, May 1985.

[36]

S. Pavel, S.G. Akl, Matrix operations using arrays with reconfigurable optical buses, J. Parallel Algorithms Appl., 8 (1996) 223-242.

[37]

C. Qiao, R. Melhem, Time-division optical communications in multiprocessor arrays, IEEE Trans. Comput., 42 (1993) 577-590.

Digital Library

[38]

S. Rajasekaran, S. Sahni, Sorting, selection, and routing on the array with reconfigurable optical buses, IEEE Trans. Parallel Distrib. Systems, 8 (1997) 1123-1132.

Digital Library

[39]

V. Strassen, Gaussian elimination is not optimal, Numer. Math., 13 (1969) 354-356.

Digital Library

[40]

S.Q. Zheng, Y. Li, Pipelined asynchronous time-division multiplexing optical bus, Opt. Eng., 36 (1997) 3392-3400.

Cited By

Katsinis C(2018)Merging, sorting and matrix operations on the SOME-bus multiprocessor architectureFuture Generation Computer Systems10.1016/S0167-739X(03)00129-820:4(643-661)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/S0167-739X%2803%2900129-8
Li K(2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-009-0319-0
Li K(2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
https://dl.acm.org/doi/10.1109/TPDS.2007.1027
Show More Cited By

Index Terms

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
2. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms
  2. Models of computation
    1. Concurrency
      1. Parallel computing models

Index terms have been assigned to the content through auto-classification.

Recommendations

Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems

We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on distributed memory systems (DMS). These problems include matrix multiplication, matrix chain product, and computing the powers, the ...
Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory Systems

Given N matrices A_{1}, A_{2}, \ldots, A_{N} of size N \times N, the matrix chain product problem is to compute A_{1} \times A_{2} \times \cdots \times A_{N}. Given an N \times N matrix A, the matrix powers problem is to calculate the first N powers of ...
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
IPDPS '00: Proceedings of the 14th International Symposium on Parallel and Distributed Processing

\math. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in \math time by using \math processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Further-more, our ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing

Journal of Parallel and Distributed Computing Volume 61, Issue 12

December 2001

144 pages

ISSN:0743-7315

Issue’s Table of Contents

Copyright © Elsevier Science (USA).

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 December 2001

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Katsinis C(2018)Merging, sorting and matrix operations on the SOME-bus multiprocessor architectureFuture Generation Computer Systems10.1016/S0167-739X(03)00129-820:4(643-661)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/S0167-739X%2803%2900129-8
Li K(2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-009-0319-0
Li K(2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
https://dl.acm.org/doi/10.1109/TPDS.2007.1027
Li K(2005)Fast and Scalable Parallel Matrix Computations on Distributed Memory SystemsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.221Online publication date: 4-Apr-2005
https://dl.acm.org/doi/10.1109/IPDPS.2005.221
Sim LLeedham GJian LSchroder H(2003)Fast solution of large N × N matrix equations in an MIMD-SIMD hybrid systemParallel Computing10.1016/j.parco.2003.05.01129:11-12(1669-1684)Online publication date: 1-Nov-2003
https://dl.acm.org/doi/10.1016/j.parco.2003.05.011

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents