research-article

Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory Systems

Author:

Keqin LiAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 18, Issue 7

Pages 865 - 878

https://doi.org/10.1109/TPDS.2007.1027

Published: 01 July 2007 Publication History

Abstract

Given N matrices A_{1}, A_{2}, \ldots, A_{N} of size N \times N, the matrix chain product problem is to compute A_{1} \times A_{2} \times \cdots \times A_{N}. Given an N \times N matrix A, the matrix powers problem is to calculate the first N powers of A, that is, A, A^{2}, A^{3}, \ldots, A^{N}. We solve the two problems on distributed memory systems (DMSs) with p processors that can support one-to-one communications in T(p) time. Assume that the fastest sequential matrix multiplication algorithm has time complexity O(N^{\alpha}), where the currently best value of \alpha is less than 2.3755. Let p be arbitrarily chosen in the range 1 \leq p \leq N^{\alpha + 1}/(\log N)^{2}. We show that the two problems can be solved by a DMS with p processors in T_{\rm chain}(N,p) = O({\frac{N^{\alpha + 1}}{p}} + T(p)(({\frac{N^{2(1 + 1/\alpha)}}{p^{2/\alpha}}})(\log^{+}{\frac{p}{N}})^{1 - 2/\alpha} + \log^{+}({\frac{p\log N}{N^{\alpha}}})\log N)) and T_{\rm power}(N,p) = O({\frac{N^{\alpha + 1}}{p}} + T(p)(({\frac{N^{2(1 + 1/\alpha)}}{p^{2/\alpha}}})(\log^{+}{\frac{p}{2\log N}})^{1 - 2/\alpha}+ (\log N)^{2})) times, respectively, where the function \log^{+} is defined as follows: \log^{+}x = \log x if x \geq 1 and \log^{+}x = 1 if 0 < x < 1. We also give instantiations of the above results on several typical DMSs and show that computing matrix chain product and matrix powers are fully scalable on distributed memory parallel computers (DMPCs), highly scalable on DMSs with hypercubic networks, and not highly scalable on DMSs with mesh and torus networks.

References

[1]

R.J. Anderson and G.L. Miller, “Optical Communication for Pointer Based Algorithms,” Technical Report CRI 88-14, Computer Science Dept., Univ. of Southern Calif., 1988.

[2]

A.F. Benner, H.F. Jordan, and V.P. Heuring, “Digital Optical Computing with Optically Switched Directional Couplers,” Optical Eng., vol. 30, pp. 1936-1941, 1991.

[3]

D. Bini and V. Pan, Polynomial and Matrix Computations, Vol. 1, Fundamental Algorithms. Birkhäuser, 1994.

Digital Library

[4]

D. Chiarulli, R. Melhem, and S. Levitan, “Using Coincident Optical Pulses for Parallel Memory Addressing,” Computer, vol. 20, no. 12, pp. 48-57, Dec. 1987.

Digital Library

[5]

D. Coppersmith and S. Winograd, “Matrix Multiplication via Arithmetic Progressions,” J. Symbolic Computation, vol. 9, pp. 251-280, 1990.

Digital Library

[6]

L. Csanky, “Fast Parallel Matrix Inversion Algorithms,” SIAM J.Computing, vol. 5, pp. 618-623, 1976.

[7]

E. Dekel, D. Nassimi, and S. Sahni, “Parallel Matrix and Graph Algorithms,” SIAM J. Computing, vol. 10, pp. 657-673, 1981.

[8]

P.W. Dowd, “Wavelength Division Multiple Access Channel Hypercube Processor Interconnection,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1223-1241, Oct. 1992.

Digital Library

[9]

M.M. Eshaghian, “Parallel Algorithms for Image Processing on OMC,” IEEE Trans. Computers, vol. 40, no. 7, pp. 827-833, July 1993.

Digital Library

[10]

M. Geréb-Graus and T. Tsantilas, “Efficient Optical Communication in Parallel Computers,” Proc. Fourth ACM Symp. Parallel Algorithms and Architectures, pp. 41-48, 1992.

Digital Library

[11]

S.S. Godbole, “On Efficient Computation of Matrix Chain Products,” IEEE Trans. Computers, vol. 22, no. 9, pp. 864-866, Sept. 1973.

Digital Library

[12]

L.A. Goldberg, M. Jerrum, T. Leighton, and S. Rao, “Doubly Logarithmic Communication Algorithms for Optical-Communication Parallel Computers,” SIAM J. Computing, vol. 26, pp. 1100-1119, 1997.

Digital Library

[13]

K. Goto and R. van de Geijn, “On Reducing TLB Misses in Matrix Multiplication,” Technical Report TR-2002-55, Dept. of Computer Sciences, Univ. of Texas, Nov. 2002.

[14]

Z. Guo, R. Melhem, R. Hall, D. Chiarulli, and S. Levitan, “Pipelined Communications in Optically Interconnected Arrays,” J. Parallel and Distributed Computing, vol. 12, pp.269-282, 1991.

Digital Library

[15]

T.C. Hu and M.T. Shing, “Computation of Matrix Chain Products. Part I,” SIAM J. Computing, vol. 11, no. 2, pp. 362-373, 1982.

[16]

T.C. Hu and M.T. Shing, “Computation of Matrix Chain Products. Part II,” SIAM J. Computing, vol. 13, no. 2, pp. 228-251, 1984.

Digital Library

[17]

O.H. Ibarra, S. Moran, and L.E. Rosier, “A Note on the Parallel Complexity of Computing the Rank of Order $n$ Matrices,” Information Processing Letters, vol. 11, no. 4 and 5, p. 162, 1980.

[18]

H. Lee, J. Kim, S.J. Hong, and S. Lee, “Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 4, pp. 394-407, 2003.

Digital Library

[19]

F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.

Digital Library

[20]

S. Levitan, D. Chiarulli, and R. Melhem, “Coincident Pulse Techniques for Multiprocessor Interconnection Structures,” Applied Optics, vol. 29, pp. 2024-2039, 1990.

[21]

K. Li, “Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers,” J. Parallel and Distributed Computing, vol. 61, no. 12, pp. 1709-1731, Dec. 2001.

Digital Library

[22]

K. Li, “Fast and Scalable Parallel Algorithms for Matrix Chain Product and Matrix Powers on Reconfigurable Pipelined Optical Buses,” J. Information Science and Eng., vol. 18, no. 5, pp. 713-727, 2002.

[23]

K. Li, “Fast and Scalable Parallel Matrix Computations with Reconfigurable Pipelined Optical Buses,” Parallel Algorithms and Applications, vol. 19, no. 4, pp. 195-209, 2004.

[24]

K. Li and V.Y. Pan, “Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System,” IEEE Trans. Computers, vol. 50, no. 5, pp. 519-525, May 2001.

Digital Library

[25]

K. Li, Y. Pan, and S.Q. Zheng, “Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 8, pp. 705-720, Aug. 1998.

Digital Library

[26]

K. Li, Y. Pan, and S.Q. Zheng, “Parallel Matrix Computations Using a Reconfigurable Pipelined Optical Bus,” J. Parallel and Distributed Computing, vol. 59, no. 1, pp. 13-30, Jan. 1999.

Digital Library

[27]

Y. Li, Y. Pan, and S.Q. Zheng, “Pipelined TDM Optical Bus with Conditional Delays,” Optical Eng., vol. 36, no. 9, pp. 2417-2424, 1997.

[28]

S.-S. Lin, “A Chained-Matrices Approach for Parallel Computation of Continued Fractions and Its Applications,” J. Scientific Computing, vol. 9, no. 1, pp. 65-80, 1994.

Digital Library

[29]

K. Mehlhorn and U. Vishkin, “Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories,” Acta Informatica, vol. 21, pp.339-374, 1984.

Digital Library

[30]

V. Pan, “Complexity of Parallel Matrix Computations,” Theoretical Computer Science, vol. 54, pp. 65-85, 1987.

Digital Library

[31]

V. Pan and J. Reif, “Efficient Parallel Solution of Linear Systems,” Proc. Seventh ACM Symp. Theory of Computing, pp. 143-152, May 1985.

Digital Library

[32]

Y. Pan and K. Li, “Linear Array with a Reconfigurable Pipelined Bus System—Concepts and Applications,” J. Information Sciences, vol. 106, no. 3 and 4, pp. 237-258, 1998.

Digital Library

[33]

C. Qiao and R. Melhem, “Time-Division Optical Communications in Multiprocessor Arrays,” IEEE Trans. Computers, vol. 42, no. 5, pp. 577-590, May 1993.

Digital Library

[34]

S.-T. Yau and Y.Y. Lu, “Reducing the Symmetric Matrix Eigenvalue Problem to Matrix Multiplications,” SIAM J. Scientific Computing, vol. 14, no. 1, pp. 121-136, 1993.

Digital Library

[35]

S.Q. Zheng and Y. Li, “Pipelined Asynchronous Time-Division Multiplexing Optical Bus,” Optical Eng., vol. 36, no. 12, pp. 3392-3400, 1997.

Cited By

Lin CLuo WFang YMa CLiu XMa Y(2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654959
Li K(2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-009-0319-0

Index Terms

Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory Systems

Recommendations

Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems

We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on distributed memory systems (DMS). These problems include matrix multiplication, matrix chain product, and computing the powers, the ...
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N ), where 2< 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(logN) time by using ...
Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System

The known fast sequential algorithms for multiplying two $N\times N$ matrices (over an arbitrary ring) have time complexity $O(N^\alpha)$, where $2 < \alpha < 3$. The current best value of $\alpha$ is less than 2.3755. We show that, for all $1 \le p \le ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 18, Issue 7

July 2007

159 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © 2007.

Publisher

IEEE Press

Publication History

Published: 01 July 2007

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin CLuo WFang YMa CLiu XMa Y(2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654959
Li K(2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-009-0319-0

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents