Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory Systems

Published: 01 July 2007 Publication History

Abstract

Given N matrices A_{1}, A_{2}, \ldots, A_{N} of size N \times N, the matrix chain product problem is to compute A_{1} \times A_{2} \times \cdots \times A_{N}. Given an N \times N matrix A, the matrix powers problem is to calculate the first N powers of A, that is, A, A^{2}, A^{3}, \ldots, A^{N}. We solve the two problems on distributed memory systems (DMSs) with p processors that can support one-to-one communications in T(p) time. Assume that the fastest sequential matrix multiplication algorithm has time complexity O(N^{\alpha}), where the currently best value of \alpha is less than 2.3755. Let p be arbitrarily chosen in the range 1 \leq p \leq N^{\alpha + 1}/(\log N)^{2}. We show that the two problems can be solved by a DMS with p processors in T_{\rm chain}(N,p) = O({\frac{N^{\alpha + 1}}{p}} + T(p)(({\frac{N^{2(1 + 1/\alpha)}}{p^{2/\alpha}}})(\log^{+}{\frac{p}{N}})^{1 - 2/\alpha} + \log^{+}({\frac{p\log N}{N^{\alpha}}})\log N)) and T_{\rm power}(N,p) = O({\frac{N^{\alpha + 1}}{p}} + T(p)(({\frac{N^{2(1 + 1/\alpha)}}{p^{2/\alpha}}})(\log^{+}{\frac{p}{2\log N}})^{1 - 2/\alpha}+ (\log N)^{2})) times, respectively, where the function \log^{+} is defined as follows: \log^{+}x = \log x if x \geq 1 and \log^{+}x = 1 if 0 < x < 1. We also give instantiations of the above results on several typical DMSs and show that computing matrix chain product and matrix powers are fully scalable on distributed memory parallel computers (DMPCs), highly scalable on DMSs with hypercubic networks, and not highly scalable on DMSs with mesh and torus networks.

References

[1]
R.J. Anderson and G.L. Miller, “Optical Communication for Pointer Based Algorithms,” Technical Report CRI 88-14, Computer Science Dept., Univ. of Southern Calif., 1988.
[2]
A.F. Benner, H.F. Jordan, and V.P. Heuring, “Digital Optical Computing with Optically Switched Directional Couplers,” Optical Eng., vol. 30, pp. 1936-1941, 1991.
[3]
D. Bini and V. Pan, Polynomial and Matrix Computations, Vol. 1, Fundamental Algorithms. Birkhäuser, 1994.
[4]
D. Chiarulli, R. Melhem, and S. Levitan, “Using Coincident Optical Pulses for Parallel Memory Addressing,” Computer, vol. 20, no. 12, pp. 48-57, Dec. 1987.
[5]
D. Coppersmith and S. Winograd, “Matrix Multiplication via Arithmetic Progressions,” J. Symbolic Computation, vol. 9, pp. 251-280, 1990.
[6]
L. Csanky, “Fast Parallel Matrix Inversion Algorithms,” SIAM J.Computing, vol. 5, pp. 618-623, 1976.
[7]
E. Dekel, D. Nassimi, and S. Sahni, “Parallel Matrix and Graph Algorithms,” SIAM J. Computing, vol. 10, pp. 657-673, 1981.
[8]
P.W. Dowd, “Wavelength Division Multiple Access Channel Hypercube Processor Interconnection,” IEEE Trans. Computers, vol. 41, no. 10, pp. 1223-1241, Oct. 1992.
[9]
M.M. Eshaghian, “Parallel Algorithms for Image Processing on OMC,” IEEE Trans. Computers, vol. 40, no. 7, pp. 827-833, July 1993.
[10]
M. Geréb-Graus and T. Tsantilas, “Efficient Optical Communication in Parallel Computers,” Proc. Fourth ACM Symp. Parallel Algorithms and Architectures, pp. 41-48, 1992.
[11]
S.S. Godbole, “On Efficient Computation of Matrix Chain Products,” IEEE Trans. Computers, vol. 22, no. 9, pp. 864-866, Sept. 1973.
[12]
L.A. Goldberg, M. Jerrum, T. Leighton, and S. Rao, “Doubly Logarithmic Communication Algorithms for Optical-Communication Parallel Computers,” SIAM J. Computing, vol. 26, pp. 1100-1119, 1997.
[13]
K. Goto and R. van de Geijn, “On Reducing TLB Misses in Matrix Multiplication,” Technical Report TR-2002-55, Dept. of Computer Sciences, Univ. of Texas, Nov. 2002.
[14]
Z. Guo, R. Melhem, R. Hall, D. Chiarulli, and S. Levitan, “Pipelined Communications in Optically Interconnected Arrays,” J. Parallel and Distributed Computing, vol. 12, pp.269-282, 1991.
[15]
T.C. Hu and M.T. Shing, “Computation of Matrix Chain Products. Part I,” SIAM J. Computing, vol. 11, no. 2, pp. 362-373, 1982.
[16]
T.C. Hu and M.T. Shing, “Computation of Matrix Chain Products. Part II,” SIAM J. Computing, vol. 13, no. 2, pp. 228-251, 1984.
[17]
O.H. Ibarra, S. Moran, and L.E. Rosier, “A Note on the Parallel Complexity of Computing the Rank of Order $n$ Matrices,” Information Processing Letters, vol. 11, no. 4 and 5, p. 162, 1980.
[18]
H. Lee, J. Kim, S.J. Hong, and S. Lee, “Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 4, pp. 394-407, 2003.
[19]
F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.
[20]
S. Levitan, D. Chiarulli, and R. Melhem, “Coincident Pulse Techniques for Multiprocessor Interconnection Structures,” Applied Optics, vol. 29, pp. 2024-2039, 1990.
[21]
K. Li, “Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers,” J. Parallel and Distributed Computing, vol. 61, no. 12, pp. 1709-1731, Dec. 2001.
[22]
K. Li, “Fast and Scalable Parallel Algorithms for Matrix Chain Product and Matrix Powers on Reconfigurable Pipelined Optical Buses,” J. Information Science and Eng., vol. 18, no. 5, pp. 713-727, 2002.
[23]
K. Li, “Fast and Scalable Parallel Matrix Computations with Reconfigurable Pipelined Optical Buses,” Parallel Algorithms and Applications, vol. 19, no. 4, pp. 195-209, 2004.
[24]
K. Li and V.Y. Pan, “Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System,” IEEE Trans. Computers, vol. 50, no. 5, pp. 519-525, May 2001.
[25]
K. Li, Y. Pan, and S.Q. Zheng, “Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 8, pp. 705-720, Aug. 1998.
[26]
K. Li, Y. Pan, and S.Q. Zheng, “Parallel Matrix Computations Using a Reconfigurable Pipelined Optical Bus,” J. Parallel and Distributed Computing, vol. 59, no. 1, pp. 13-30, Jan. 1999.
[27]
Y. Li, Y. Pan, and S.Q. Zheng, “Pipelined TDM Optical Bus with Conditional Delays,” Optical Eng., vol. 36, no. 9, pp. 2417-2424, 1997.
[28]
S.-S. Lin, “A Chained-Matrices Approach for Parallel Computation of Continued Fractions and Its Applications,” J. Scientific Computing, vol. 9, no. 1, pp. 65-80, 1994.
[29]
K. Mehlhorn and U. Vishkin, “Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories,” Acta Informatica, vol. 21, pp.339-374, 1984.
[30]
V. Pan, “Complexity of Parallel Matrix Computations,” Theoretical Computer Science, vol. 54, pp. 65-85, 1987.
[31]
V. Pan and J. Reif, “Efficient Parallel Solution of Linear Systems,” Proc. Seventh ACM Symp. Theory of Computing, pp. 143-152, May 1985.
[32]
Y. Pan and K. Li, “Linear Array with a Reconfigurable Pipelined Bus System—Concepts and Applications,” J. Information Sciences, vol. 106, no. 3 and 4, pp. 237-258, 1998.
[33]
C. Qiao and R. Melhem, “Time-Division Optical Communications in Multiprocessor Arrays,” IEEE Trans. Computers, vol. 42, no. 5, pp. 577-590, May 1993.
[34]
S.-T. Yau and Y.Y. Lu, “Reducing the Symmetric Matrix Eigenvalue Problem to Matrix Multiplications,” SIAM J. Scientific Computing, vol. 14, no. 1, pp. 121-136, 1993.
[35]
S.Q. Zheng and Y. Li, “Pipelined Asynchronous Time-Division Multiplexing Optical Bus,” Optical Eng., vol. 36, no. 12, pp. 3392-3400, 1997.

Cited By

View all
  • (2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
  • (2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 18, Issue 7
July 2007
159 pages

Publisher

IEEE Press

Publication History

Published: 01 July 2007

Author Tags

  1. Cost optimality
  2. distributed memory parallel computer
  3. distributed memory system
  4. dynamic processor allocation
  5. hypercubic network
  6. matrix chain product
  7. matrix multiplication
  8. matrix power
  9. mesh
  10. scalability
  11. speedup
  12. torus.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
  • (2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media